0% found this document useful (0 votes)

382 views106 pages

GLTF Tutorials - Wei Zhi PDF

This document provides an introduction to glTF, a format for transmitting 3D graphics over the web. glTF aims to bridge the gap between 3D content creation and rendering 3D scenes in web applications. It defines a standard file format that describes 3D scenes using JSON for the structure and optimized binary formats for the graphics data, allowing efficient parsing and rendering. The tutorial explains the basic components of a glTF file, including the scene graph defined by nodes, meshes, materials, animations, and other elements that describe 3D graphics in a way that can be directly used by web APIs like WebGL.

Uploaded by

czq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

382 views106 pages

GLTF Tutorials - Wei Zhi PDF

Uploaded by

czq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 106

glTF Tutorial

By Marco Hutter, @javagl

This tutorial gives an introduction to glTF, the GL transmission format. It summarizes the most important
features and application cases of glTF, and describes the structure of the files that are related to glTF. It explains
how glTF assets may be read, processed, and used to display 3D graphics efficiently.

Some basic knowledge about JSON, the JavaScript Object Notation, is assumed. Additionally, a basic
understanding of common graphics APIs, like OpenGL or WebGL, is required.

Introduction
Basic glTF Structure
Example: A Minimal glTF File
Scenes and Nodes
Buffers, BufferViews, and Accessors
Example: A Simple Animation
Animations
Example: Simple Meshes
Meshes
Materials
Example: A Simple Material
Textures, Images, and Samplers
Example: A Simple Texture
Example: An Advanced Material
Example: Simple Cameras
Cameras
Example: A Simple Morph Target
Morph Targets
Example: Simple Skin
Skins

Acknowledgements:

Patrick Cozzi, Cesium, @pjcozzi
Alexey Knyazev, @lexaknyazev
Sarah Chow, @slchow
Table of Contents | Next: Basic glTF Structure

Introduction to glTF using WebGL
An increasing number of applications and services are based on 3D content. Online shops offer product
configurators with a 3D preview. Museums digitize their artifacts with 3D scans and allow visitors to explore their
collections in virtual galleries. City planners use 3D city models for planning and information visualization.
Educators create interactive, animated 3D models of the human body. Many of these applications run directly in
the web browser, which is possible because all modern browsers support efficient rendering with WebGL.

Image 1a: Screenshots of various websites and applications showing 3D models.

Demand for 3D content in various applications is constantly increasing. In many cases, the 3D content has to be
transferred over the web, and it has to be rendered efficiently on the client side. But until now, there has been a
gap between the 3D content creation and efficient rendering of that 3D content in the runtime applications.

3D content pipelines
3D content that is rendered in client applications comes from different sources and is stored in different file
formats. The list of 3D graphics file formats on Wikipedia shows an overwhelming number, with more than 70
different file formats for 3D data, serving different purposes and application cases.
For example, raw 3D data may be obtained with a 3D scanner. These scanners usually provide the geometry data
of a single object, which is stored in OBJ, PLY, or STL files. These file formats do not contain information about
the scene structure or how the objects should be rendered.

More sophisticated 3D scenes can be created with authoring tools. These tools allow one to edit the structure of
the scene, the light setup, cameras, animations, and, of course, the 3D geometry of the objects that appear in
the scene. Applications store this information in their own, custom file formats. For example, Blender stores the
scenes in .blend files, LightWave3D uses the .lws file format, 3ds Max uses the .max file format, and Maya
uses .ma files.

In order to render such 3D content, the runtime application must be able to read different input file formats. The
scene structure has to be parsed, and the 3D geometry data has to be converted into the format required by the
graphics API. The 3D data has to be transferred to the graphics card memory, and then the rendering process
can be described with sequences of graphics API calls. Thus, each runtime application has to create importers,
loaders, or converters for all file formats that it will support, as shown in Image 1b.

Image 1b: The 3D content pipeline today.

glTF: A transmission format for 3D scenes
The goal of glTF is to define a standard for representing 3D content, in a form that is suitable for use in runtime
applications. The existing file formats are not appropriate for this use case: some of do not contain any scene
information, but only geometry data; others have been designed for exchanging data between authoring
applications, and their main goal is to retain as much information about the 3D scene as possible, resulting in
files that are usually large, complex, and hard to parse. Additionally, the geometry data may have to be
preprocessed so that it can be rendered with the client application.
None of the existing file formats were designed for the use case of efficiently transferring 3D scenes over the
web and rendering them as efficiently as possible. But glTF is not "yet another file format." It is the definition of a
transmission format for 3D scenes:

The scene structure is described with JSON, which is very compact and can easily be parsed.
The 3D data of the objects are stored in a form that can be directly used by the common graphics APIs, so
there is no overhead for decoding or preprocessing the 3D data.

Different content creation tools may now provide 3D content in the glTF format. And an increasing number of
client applications are able to consume and render glTF. Some of these applications are shown in Image 1a. So
glTF may help to bridge the gap between content creation and rendering, as shown in Image 1c.

Image 1c: The 3D content pipeline with glTF.

An increasing number of content creation tools will be able to provide glTF directly. Alternatively, other file
formats can be used to create glTF assets, using one of the opensource conversion utilities listed in the
Khronos glTF repository. For example, nearly all authoring applications can export their scenes in the COLLADA
format. So the COLLADA2GLTF tool can be used to convert scenes and models from these authoring
applications to glTF. OBJ files may be converted to glTF using obj2gltf. For other file formats, custom converters
can be used to create glTF assets, thus making the 3D content available for a broad range of runtime
applications.

Table of Contents | Next: Basic glTF Structure
Previous: Introduction | Table of Contents | Next: A Minimal glTF File

The Basic Structure of glTF
The core of glTF is a JSON file. This file describes the whole contents of the 3D scene. It consists of a
description of the scene structure itself, which is given by a hierarchy of nodes that define a scene graph. The 3D
objects that appear in the scene are defined using meshes that are attached to the nodes. Materials define the
appearance of the objects. Animations describe how the 3D objects are transformed (e.g., rotated to translated)
over time, and skins define how the geometry of the objects is deformed based on a skeleton pose. Cameras
describe the view configuration for the renderer.

The JSON structure
The scene objects are stored in arrays in the JSON file. They can be accessed using the index of the respective
object in the array:

"meshes" :
[
    { ... }
    { ... }
    ...
],

These indices are also used to define the relationships between the objects. The example above defines multiple
meshes, and a node may refer to one of these meshes, using the mesh index, to indicate that the mesh should
be attached to this node:

"nodes":
[
    { "mesh": 0, ... },
    { "mesh": 5, ... },
    ...
}

The following image (adapted from the glTF concepts section) gives an overview of the toplevel elements of the
JSON part of a glTF asset:

Image 2a: The glTF JSON structure

These elements are summarized here quickly, to give an overview, with links to the respective sections of the
glTF specification. More detailed explanations of the relationships between these elements will be given in the
following sections.

The scene is the entry point for the description of the scene that is stored in the glTF. It refers to the

node s that define the scene graph.
The node is one node in the scene graph hierarchy. It can contain a transformation (e.g., rotation or
translation), and it may refer to further (child) nodes. Additionally, it may refer to mesh or camera instances
that are "attached" to the node, or to a skin that describes a mesh deformation.
The camera defines the view configuration for rendering the scene.
A mesh describes a geometric object that appears in the scene. It refers to accessor objects that are
used for accessing the actual geometry data, and to material s that define the appearance of the object
when it is rendered.
The skin defines parameters that are required for vertex skinning, which allows the deformation of a mesh
based on the pose of a virtual character. The values of these parameters are obtained from an accessor .
An animation describes how transformations of certain nodes (e.g., rotation or translation) change over
time.
The accessor is used as an abstract source of arbitrary data. It is used by the mesh , skin , and
animation , and provides the geometry data, the skinning parameters and the timedependent animation
values. It refers to a bufferView , which is a part of a buffer that contains the actual raw binary data.
The material contains the parameters that define the appearance of an object. It usually refers to
texture objects that will be applied to the rendered geometry.
The texture is defined by a sampler and an image . The sampler defines how the texture image
should be placed on the object.

References to external data
The binary data, like geometry and textures of the 3D objects, are usually not contained in the JSON file.
Instead, they are stored in dedicated files, and the JSON part only contains links to these files. This allows the
binary data to be stored in a form that is very compact and can efficiently be transferred over the web.
Additionally, the data can be stored in a format that can be used directly in the renderer, without having to parse,
decode, or preprocess the data.

"Uniform
Resource Identifiers (URI)

Image 2b: The glTF structure

As shown in the image above, there are two types of objects that may contain such links to external resources,
namely buffers and images . These objects will later be explained in more detail.

Reading and managing external data
Reading and processing a glTF asset starts with parsing the JSON structure. After the structure has been
parsed, the buffer and image objects are available in the toplevel buffers and images arrays,
respectively. Each of these objects may refer to blocks of binary data. For further processing, this data is read
into memory. Usually, the data will be be stored in an array so that they may be looked up using the same index
that is used for referring to the buffer or image object that they belong to.

Binary data in buffers
A buffer contains a URI that points to a file containing the raw, binary buffer data:
"buffer01": {
    "byteLength": 12352,
    "type": "arraybuffer",
    "uri": "buffer01.bin"
}

This binary data is just a raw block of memory that is read from the URI of the buffer , with no inherent meaning

or structure. The Buffers, BufferViews, and Accessors section will show how this raw data is extended with
information about data types and the data layout. With this information, one part of the data may, for example, be
interpreted as animation data, and another part may be interpreted as geometry data. Storing the data in a binary
form allows it to be transferred over the web much more efficiently than in the JSON format, and the binary data
can be passed directly to the renderer without having to decode or preprocess it.

Image data in images
An image may refer to an external image file that can be used as the texture of a rendered object:

"image01": {
"uri": "image01.png"
}

The reference is given as a URI that usually points to a PNG or JPG file. These formats significantly reduce the
size of the files so that they may efficiently be transferred over the web. In some cases, the image objects may
not refer to an external file, but to data that is stored in a buffer . The details of this indirection will be explained
in the Textures, Images, and Samplers section.

Binary data in data URIs
Usually, the URIs that are contained in the buffer and image objects will point to a file that contains the actual
data. As an alternative, the data may be embedded into the JSON, in binary format, by using a data URI.

Previous: Introduction | Table of Contents | Next: A Minimal glTF File
Previous: Basic glTF Structure | Table of Contents | Next: Scenes and Nodes

A Minimal glTF File
The following is a minimal but complete glTF asset, containing a single, indexed triangle. You can copy and
paste it into a gltf file, and every glTFbased application should be able to load and render it. This section will
explain the basic concepts of glTF based on this example.
{
  "scenes" : [
    {
      "nodes" : [ 0 ]
    }
  ],

  "nodes" : [
    {
      "mesh" : 0
    }
  ],

  "meshes" : [
    {
      "primitives" : [ {
        "attributes" : {
          "POSITION" : 1
        },
        "indices" : 0
      } ]
    }
  ],

  "buffers" : [
    {
      "uri" : "data:application/octet‐
stream;base64,AAABAAIAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAACAPwAAAAA=",
      "byteLength" : 44
    }
  ],
  "bufferViews" : [
    {
      "buffer" : 0,
      "byteOffset" : 0,
      "byteLength" : 6,
      "target" : 34963
    },
    {
      "buffer" : 0,
      "byteOffset" : 8,
      "byteLength" : 36,
      "target" : 34962
    }
  ],
  "accessors" : [
    {
      "bufferView" : 0,
      "byteOffset" : 0,
      "componentType" : 5123,
      "count" : 3,
      "type" : "SCALAR",

      "max" : [ 2 ],
      "min" : [ 0 ]
    },
    {
      "bufferView" : 1,
      "byteOffset" : 0,
      "componentType" : 5126,
      "count" : 3,
      "type" : "VEC3",
      "max" : [ 1.0, 1.0, 0.0 ],
      "min" : [ 0.0, 0.0, 0.0 ]
    }
  ],

  "asset" : {
    "version" : "2.0"
  }
}

Image 3a: A single triangle.

The scene and nodes structure

The scene is the entry point for the description of the scene that is stored in the glTF. When parsing a glTF
JSON file, the traversal of the scene structure will start here. Each scene contains an array called nodes , which
contains the indices of node objects. These nodes are the root nodes of a scene graph hierarchy.

The example here consists of a single scene. It refers to the only node in this example, which is the node with
the index 0. This node, in turn, refers to the only mesh, which has the index 0:

   "scenes" : [
    {
      "nodes" : [ 0 ]
    }
  ],

  "nodes" : [
    {
      "mesh" : 0
    }
  ],

More details about scenes and nodes and their properties will be given in the Scenes and Nodes section.

The meshes
A mesh represents an actual geometric object that appears in the scene. The mesh itself usually does not have
any properties, but only contains an array of mesh.primitive objects, which serve as building blocks for larger
models. Each mesh primitive contains a description of the geometry data that the mesh consists of.

The example consists of a single mesh, and has a single mesh.primitive object. The mesh primitive has an

array of attributes . These are the attributes of the vertices of the mesh geometry, and in this case, this is only
the POSITION attribute, describing the positions of the vertices. The mesh primitive describes an indexed
geometry, which is indicated by the indices property. By default, it is assumed to describe a set of triangles, so
that three consecutive indices are the indices of the vertices of one triangle.

The actual geometry data of the mesh primitive is given by the attributes and the indices . These both refer

to accessor objects, which will be explained below.

   "meshes" : [
    {
      "primitives" : [ {
        "attributes" : {
          "POSITION" : 1
        },
        "indices" : 0
      } ]
    }
  ],

A more detailed description of meshes and mesh primitives can be found in the meshes section.

The buffer , bufferView , and accessor concepts

The buffer , bufferView , and accessor objects provide information about the geometry data that the mesh
primitives consist of. They are introduced here quickly, based on the specific example. A more detailed
description of these concepts will be given in the Buffers, BufferViews, and Accessors section.

Buffers
A buffer defines a block of raw, unstructured data with no inherent meaning. It contains an uri , which can
either point to an external file that contains the data, or it can be a data URI that encodes the binary data directly
in the JSON file.

In the example file, the second approach is used: there is a single buffer, containing 44 bytes, and the data of a
this buffer is encoded as a data URI:

   "buffers" : [
    {
      "uri" : "data:application/octet‐
stream;base64,AAABAAIAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAACAPwAAAAA=",
      "byteLength" : 44
    }
  ],

This data contains the indices of the triangle, and the vertex positions of the triangle. But in order to actually use
this data as the geometry data of a mesh primitive, additional information about the structure of this data is
required. This information about the structure is encoded in the bufferView and accessor objects.

Buffer views
A bufferView describes a "chunk" or a "slice" of the whole, raw buffer data. In the given example, there are two
buffer views. They both refer to the same buffer. The first buffer view refers to the part of the buffer that contains
the data of the indices: it has a byteOffset of 0 referring to the whole buffer data, and a byteLength of 6. The
second buffer view refers to the part of the buffer that contains the vertex positions. It starts at a byteOffset of
8, and has a byteLength of 36; that is, it extends to the end of the whole buffer.

   "bufferViews" : [
    {
      "buffer" : 0,
      "byteOffset" : 0,
      "byteLength" : 6,
      "target" : 34963
    },
    {
      "buffer" : 0,
      "byteOffset" : 8,
      "byteLength" : 36,
      "target" : 34962
    }
  ],

Accessors
The second step of structuring the data is accomplished with accessor objects. They define how the data of a
bufferView has to be interpreted by providing information about the data types and the layout.

In the example, there are two accessor objects.

The first accessor describes the indices of the geometry data. It refers to the bufferView with index 0, which is

the part of the buffer that contains the raw data for the indices. Additionally, it specifies the count and type
of the elements and their componentType . In this case, there are 3 scalar elements, and their component type is
given by a constant that stands for the unsigned short type.

The second accessor describes the vertex positions. It contains a reference to the relevant part of the buffer
data, via the bufferView with index 1, and its count , type , and componentType properties say that there are
three elements of 3D vectors, each having float components.

   "accessors" : [
    {
      "bufferView" : 0,
3个 unit
      "byteOffset" : 0,
      "componentType" : 5123,
在buffer的位置 offset:0, size:6
      "count" : 3,
      "type" : "SCALAR",
      "max" : [ 2 ],
      "min" : [ 0 ]
    },
    {
      "bufferView" : 1,
      "byteOffset" : 0,
      "componentType" : 5126, 3个 vec3<float>
      "count" : 3,
在 buffer 的位置 offset:8, size:36
      "type" : "VEC3",
      "max" : [ 1.0, 1.0, 0.0 ],
      "min" : [ 0.0, 0.0, 0.0 ]
    }
  ],

As described above, a mesh.primitive may now refer to these accessors, using their indices:

   "meshes" : [
    {
      "primitives" : [ {
        "attributes" : {
          "POSITION" : 1
        },
        "indices" : 0
      } ]
    }
  ],

When this mesh.primitive has to be rendered, the renderer can resolve the underlying buffer views and buffers

and will send the required parts of the buffer to the renderer, together with the information about the data types
and layout. A more detailed description of how the accessor data is obtained and processed by the renderer is
given in the Buffers, BufferViews, and Accessors section and the Materials and Techniques section.

The asset description

The asset description
In glTF 1.0, this property is still optional, but in subsequent glTF versions, the JSON file is required to contain an
asset property that contains the version number. The example here says that the asset complies to glTF
version 2.0:

   "asset" : {
    "version" : "2.0"
  }

The asset property may contain additional metadata that is described in the asset specification.

Previous: Basic glTF Structure | Table of Contents | Next: Scenes and Nodes
Previous: A Minimal glTF File | Table of Contents | Next: Buffers, BufferViews, and Accessors

Scenes and Nodes
Scenes
There may be multiple scenes stored in one glTF file, but in many cases, there will be only a single scene, which
then also is the default scene. Each scene contains an array of nodes , which are the indices of the root nodes
of the scene graphs. Again, there may be multiple root nodes, forming different hierarchies, but in many cases,
the scene will have a single root node. The most simple possible scene description has already been shown in
the previous section, consisting of a single scene with a single node:

   "scenes" : [
    {
      "nodes" : [ 0 ]
    }
  ],

  "nodes" : [
    {
      "mesh" : 0
    }
  ],

Nodes forming the scene graph
Each node can contain an array called children that contains the indices of its child nodes. So each node is
one element of a hierarchy of nodes, and together they define the structure of the scene as a scene graph.

Image 4a: The scene graph representation stored in the glTF JSON.

Each of the nodes that are given in the scene can be traversed, recursively visiting all their children, to process

all elements that are attached to the nodes. The simplified pseudocode for this traversal may look like the
following:

traverse(node) {
    // Process the meshes, cameras, etc., that are
    // attached to this node ‐ discussed later
    processElements(node);

    // Recursively process all children
    for each (child in node.children) {
        traverse(child);
    }
}

In practice, some additional information will be required for the traversal: the processing of some elements that
are attached to nodes will require information about which node they are attached to. Additionally, the information
about the transforms of the nodes has to be accumulated during the traversal.

Local and global transforms
Each node can have a transform. Such a transform will define a translation, rotation, and/or scale. This transform
will be applied to all elements attached to the node itself and to all its child nodes. The hierarchy of nodes thus
allows one to structure the translations, rotations, and scalings that are applied to the scene elements.

Local transforms of nodes
There are different possible representations for the local transform of a node. The transform can be given directly
by the matrix property of the node. This is an array of 16 floating point numbers that describe the matrix in
columnmajor order. For example, the following matrix describes a scaling about (2,1,0.5), a rotation about 30
degrees around the xaxis, and a translation about (10,20,30):

"node0": {
    "matrix": [
        2.0,    0.0,    0.0,    0.0,
        0.0,    0.866,  0.5,    0.0,
        0.0,   ‐0.25,   0.433,  0.0,
       10.0,   20.0,   30.0,    1.0
    ]
}

The matrix defined here is as shown in Image 4b.

Image 4b: An example matrix.

The transform of a node can also be given using the translation , rotation , and scale properties of a node,

which is sometimes abbreviated as TRS:

"node0": {
    "translation": [ 10.0, 20.0, 30.0 ],
    "rotation": [ 0.259, 0.0, 0.0, 0.966 ],
    "scale": [ 2.0, 1.0, 0.5 ]
}

Each of these properties can be used to create a matrix, and the product of these matrices then is the local
transform of the node:

The translation just contains the translation in x, y, and zdirection. For example, from a translation of

[ 10.0, 20.0, 30.0 ] , one can create a translation matrix that contains this translation as its last column,
as shown in Image 4c.

Image 4c: A translation matrix.

The rotation is given as a quaternion. The mathematical background of quaternions is beyond the scope

of this tutorial. For now, the most important information is that a quaternion is a compact representation of a
rotation about an arbitrary angle and around an arbitrary axis. For example, the quaternion [ 0.259, 0.0,
0.0, 0.966 ] describes a rotation about 30 degrees, around the xaxis. So this quaternion can be
converted into a rotation matrix, as shown in Image 4d.

Image 4d: A rotation matrix.

The scale contains the scaling factors along the x, y, and zaxes. The corresponding matrix can be

created by using these scaling factors as the entries on the diagonal of the matrix. For example, the scale
matrix for the scaling factors [ 2.0, 1.0, 0.5 ] is shown in Image 4e.

Image 4e: A scale matrix.

When computing the final, local transform matrix of the node, these matrices are multiplied together. It is
important to perform the multiplication of these matrices in the right order. The local transform matrix always has
to be computed as M = T * R * S , where T is the matrix for the translation part, R is the matrix for the
rotation part, and S is the matrix for the scale part. So the pseudocode for the computation is

translationMatrix = createTranslationMatrix(node.translation);
rotationMatrix = createRotationMatrix(node.rotation);
scaleMatrix = createScaleMatrix(node.scale);
localTransform = translationMatrix * rotationMatrix * scaleMatrix;

For the example matrices given above, the final, local transform matrix of the node will be as shown in Image 4f.

Image 4f: The final local transform matrix computed from the TRS properties.

This matrix will cause the vertices of the meshes to be scaled, then rotated, and then translated according to the
scale , rotation , and translation properties that have been given in the node.

When any of the three properties is not given, the identity matrix will be used. Similarly, when a node contains
neither a matrix property nor TRSproperties, then its local transform will be the identity matrix.

Global transforms of nodes
Regardless of the representation in the JSON file, the local transform of a node can be stored as a 4×4 matrix.
The global transform of a node is given by the product of all local transforms on the path from the root to the
respective node:

Structure:           local transform      global transform
root                 R                    R
+‐ nodeA            A                    R*A
     +‐ nodeB        B                    R*A*B
     +‐ nodeC        C                    R*A*C

It is important to point out that after the file was loaded these global transforms can not be computed only once.
Later, it will be shown how animations may modify the local transforms of individual nodes. And these
modifications will affect the global transforms of all descendant nodes. Therefore, when the global transform of a
node is required, it has to be computed directly from the current local transforms of all nodes. Alternatively, and
as a potential performance improvement, an implementation could cache the global transforms, detect changes
in the local transforms of ancestor nodes, and update the global transforms only when necessary. The different
implementation options for this will depend on the programming language and the requirements for the client
application, and thus are beyond the scope of this tutorial.

Previous: A Minimal glTF File | Table of Contents | Next: Buffers, BufferViews, and Accessors
Previous: Scenes and Nodes | Table of Contents | Next: Simple Animation

Buffers, BufferViews, and Accessors
An example of buffer , bufferView , and accessor objects was already given in the Minimal glTF File section.
This section will explain these concepts in more detail.

Buffers
A buffer represents a block of raw binary data, without an inherent structure or meaning. This data is referred
to by a buffer using its uri . This URI may either point to an external file, or be a data URI that encodes the
binary data directly in the JSON file. The minimal glTF file contained an example of a buffer , with 44 bytes of
data, encoded in a data URI:

   "buffers" : [
    {
      "uri" : "data:application/octet‐
stream;base64,AAABAAIAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAACAPwAAAAA=",
      "byteLength" : 44
    }
  ],

Image 5a: The buffer data, consisting of 44 bytes.

Parts of the data of a buffer may have to be passed to the renderer as vertex attributes, or as indices, or the

data may contain skinning information or animation key frames. In order to be able to use this data, additional
information about the structure and type of this data is required.

BufferViews
The first step of structuring the data from a buffer is with bufferView objects. A bufferView represents a
"slice" of the data of one buffer. This slice is defined using an offset and a length, in bytes. The minimal glTF file
defined two bufferView objects:
   "bufferViews" : [
    {
      "buffer" : 0,
      "byteOffset" : 0,
      "byteLength" : 6,
      "target" : 34963
    },
    {
      "buffer" : 0,
      "byteOffset" : 8,
      "byteLength" : 36,
      "target" : 34962
    }
  ],

The first bufferView refers to the first 6 bytes of the buffer data. The second one refers to 36 bytes of the

buffer, with an offset of 8 bytes, as shown in this image:

Image 5b: The buffer views, referring to parts of the buffer.

The bytes that are shown in light gray are padding bytes that are required for properly aligning the accessors, as
described below.

Each bufferView additionally contains a target property. This property may later be used by the renderer to

classify the type or nature of the data that the buffer view refers to. The target can be a constant indicating
that the data is used for vertex attributes ( 34962 , standing for ARRAY_BUFFER ), or that the data is used for
vertex indices ( 34963 , standing for ELEMENT_ARRAY_BUFFER ).

At this point, the buffer data has been divided into multiple parts, and each part is described by one

bufferView . But in order to really use this data in a renderer, additional information about the type and layout of
the data is required.
accessors 定义 type and layout of the data
Accessors
An accessor object refers to a bufferView and contains properties that define the type and layout of the data
of this bufferView .

Data type
The type of an accessor's data is encoded in the type and the componentType properties. The value of the
type property is a string that specifies whether the data elements are scalars, vectors, or matrices. For
example, the value may be "SCALAR" for scalar values, "VEC3" for 3D vectors, or "MAT4" for 4×4 matrices.

The componentType specifies the type of the components of these data elements. This is a GL constant that

may, for example, be 5126 ( FLOAT ) or 5123 ( UNSIGNED_SHORT ), to indicate that the elements have float or
unsigned short components, respectively.

Different combinations of these properties may be used to describe arbitrary data types. For example, the
minimal glTF file contained two accessors:

   "accessors" : [
    {
      "bufferView" : 0,
      "byteOffset" : 0,
      "componentType" : 5123,
      "count" : 3,
      "type" : "SCALAR",
      "max" : [ 2 ],
      "min" : [ 0 ]
    },
    {
      "bufferView" : 1,
      "byteOffset" : 0,
      "componentType" : 5126,
      "count" : 3,
      "type" : "VEC3",
      "max" : [ 1.0, 1.0, 0.0 ],
      "min" : [ 0.0, 0.0, 0.0 ]
    }
  ],

The first accessor refers to the bufferView with index 0, which defines the part of the buffer data that

contains the indices. Its type is "SCALAR" , and its componentType is 5123 ( UNSIGNED_SHORT ). This means
that the indices are stored as scalar unsigned short values.

The second accessor refers to the bufferView with index 1, which defines the part of the buffer data that

contains the vertex attributes particularly, the vertex positions. Its type is "VEC3" , and its componentType is
5126 ( FLOAT ). So this accessor describes 3D vectors with floating point components.

Data layout
Additional properties of an accessor further specify the layout of the data. The count property of an accessor
indicates how many data elements it consists of. In the example above, the count has been 3 for both
accessors, standing for the three indices and the three vertices of the triangle, respectively. Each accessor also
has a byteOffset property. For the example above, it has been 0 for both accessors, because there was only
one accessor for each bufferView . But when multiple accessors refer to the same bufferView , then the
byteOffset describes where the data of the accessor starts, relative to the bufferView that it refers to.

Data alignment
The data that is referred to by an accessor may be sent to the graphics card for rendering, or be used at the
host side as animation or skinning data. Therefore, the data of an accessor has to be aligned based on the type
of the data. For example, when the componentType of an accessor is 5126 ( FLOAT ), then the data must be
aligned at 4byte boundaries, because a single float value consists of four bytes. This alignment requirement
of an accessor refers to its bufferView and the underlying buffer . Particularly, the alignment requirements
are as follows:

The byteOffset of an accessor must be divisible by the size of its componentType .

The sum of the byteOffset of an accessor and the byteOffset of the bufferView that it refers to must
be divisible by the size of its componentType .

In the example above, the byteOffset of the bufferView with index 1 (which refers to the vertex attributes)

was chosen to be 8 , in order to align the data of the accessor for the vertex positions to 4byte boundaries. The
bytes 6 and 7 of the buffer are thus padding bytes that do not carry relevant data.

Image 5c illustrates how the raw data of a buffer is structured using bufferView objects and is augmented

with data type information using accessor objects.

Image 5c: The accessors defining how to interpret the data of the buffer views.

Data interleaving
The data of the attributes that are stored in a single bufferView may be stored as an ArrayOfStructures. A
single bufferView may, for example, contain the data for vertex positions and for vertex normals in an
interleaved fashion. In this case, the byteOffset of an accessor defines the start of the first relevant data
element for the respective attribute, and the bufferView defines an additional byteStride property. This is the
number of bytes between the start of one element of its accessors, and the start of the next one. An example of
how interleaved position and normal attributes are stored inside a bufferView is shown in Image 5d.

Image 5d: Interleaved acessors in one buffer view.

Data contents
An accessor also contains min and max properties that summarize the contents of their data. They are the
componentwise minimum and maximum values of all data elements contained in the accessor. In the case of
vertex positions, the min and max properties thus define the bounding box of an object. This can be useful for
prioritizing downloads, or for visibility detection. In general, this information is also useful for storing and
processing quantized data that is dequantized at runtime, by the renderer, but details of this quantization are
beyond the scope of this tutorial.

Sparse accessors
With version 2.0, the concept of sparse accessors was introduced in glTF. This is a special representation of
data that allows very compact storage of multiple data blocks that have only a few different entries. For example,
when there is geometry data that contains vertex positions, this geometry data may be used for multiple objects.
This may be achieved by referring to the same accessor from both objects. If the vertex positions for both
objects are mostly the same and differ for only a few vertices, then it is not necessary to store the whole
geometry data twice. Instead, it is possible to store the data only once, and use a sparse accessor to store only
the vertex positions that differ for the second object.

The following is a complete glTF asset, in embedded representation, that shows an example of sparse
accessors:
{
  "scenes" : [ {
    "nodes" : [ 0 ]
  } ],

  "nodes" : [ {
    "mesh" : 0
  } ],

  "meshes" : [ {
    "primitives" : [ {
      "attributes" : {
        "POSITION" : 1
      },
      "indices" : 0
    } ]
  } ],

  "buffers" : [ {
    "uri" : "data:application/gltf‐
buffer;base64,AAAIAAcAAAABAAgAAQAJAAgAAQACAAkAAgAKAAkAAgADAAoAAwALAAoAAwAEAAsABAAMAAsABAAFAAwABQA
NAAwABQAGAA0AAAAAAAAAAAAAAAAAAACAPwAAAAAAAAAAAAAAQAAAAAAAAAAAAABAQAAAAAAAAAAAAACAQAAAAAAAAAAAAACg
QAAAAAAAAAAAAADAQAAAAAAAAAAAAAAAAAAAgD8AAAAAAACAPwAAgD8AAAAAAAAAQAAAgD8AAAAAAABAQAAAgD8AAAAAAACAQ
AAAgD8AAAAAAACgQAAAgD8AAAAAAADAQAAAgD8AAAAACAAKAAwAAAAAAIA/AAAAQAAAAAAAAEBAAABAQAAAAAAAAKBAAACAQA
AAAAA=",
    "byteLength" : 284
  } ],

  "bufferViews" : [ {
    "buffer" : 0,
    "byteOffset" : 0,
    "byteLength" : 72,
    "target" : 34963
  }, {
    "buffer" : 0,
    "byteOffset" : 72,
    "byteLength" : 168
  }, {
    "buffer" : 0,
    "byteOffset" : 240,
    "byteLength" : 6
  }, {
    "buffer" : 0,
    "byteOffset" : 248,
    "byteLength" : 36
  } ],

  "accessors" : [ {
    "bufferView" : 0,
    "byteOffset" : 0,
    "componentType" : 5123,
    "count" : 36,

    "type" : "SCALAR",
    "max" : [ 13 ],
    "min" : [ 0 ]
  }, {
    "bufferView" : 1,
    "byteOffset" : 0,
    "componentType" : 5126,
    "count" : 14,
    "type" : "VEC3",
    "max" : [ 6.0, 4.0, 0.0 ],
    "min" : [ 0.0, 0.0, 0.0 ],
    "sparse" : {
      "count" : 3,
      "indices" : {
        "bufferView" : 2,
        "byteOffset" : 0,
        "componentType" : 5123
      },
      "values" : {
        "bufferView" : 3,
        "byteOffset" : 0
      }
    }
  } ],

  "asset" : {
    "version" : "2.0"
  }
}

The result of rendering this asset is shown in Image 5e:

Image 5e: The result of rendering the simple sparse accessor asset.

The example contains two accessors: one for the indices of the mesh, and one for the vertex positions. The one
that refers to the vertex positions defines an additional accessor.sparse property, which contains the
information about the sparse data substitution that should be applied:

   "accessors" : [
  ...
  {
    "bufferView" : 1,
    "byteOffset" : 0,
    "componentType" : 5126,
    "count" : 14,
    "type" : "VEC3",
    "max" : [ 6.0, 4.0, 0.0 ],
    "min" : [ 0.0, 0.0, 0.0 ],
    "sparse" : {
      "count" : 3,
      "indices" : {
        "bufferView" : 2,
        "byteOffset" : 0,
        "componentType" : 5123
      },
      "values" : {
        "bufferView" : 3,
        "byteOffset" : 0
      }
    }

} ],
This sparse object itself defines the count of elements that will be affected by the substitution. The
sparse.indices property refers to a bufferView that contains the indices of the elements which will be
replaced. The sparse.values refers to a bufferView that contains the actual data.

In the example, the original geometry data is stored in the bufferView with index 1. It describes a rectangular

array of vertices. The sparse.indices refer to the bufferView with index 2, which contains the indices [8,
10, 12] . The sparse.values refers to the bufferView with index 3, which contains new vertex positions,
namely, [(1,2,0), (3,3,0), (5,4,0)] . The effect of applying the corresponding substitution is shown in Image
5f.

Image 5f: The substitution that is done with the sparse accessor.
Previous: Scenes and Nodes | Table of Contents | Next: Simple Animation
Previous: Buffers, BufferViews, and Accessors | Table of Contents | Next: Animations

A Simple Animation
As shown in the Scenes and Nodes section, each node can have a local transform. This transform can be given
either by the matrix property of the node or by using the translation , rotation , and scale (TRS)
properties.

When the transform is given by the TRS properties, an animation can be used to describe how the

translation , rotation , or scale of a node changes over time.

The following is the minimal glTF file that was shown previously, but extended with an animation. This section will
explain the changes and extensions that have been made to add this animation.
{
  "scenes" : [
    {
      "nodes" : [ 0 ]
    }
  ],

  "nodes" : [
    {
      "mesh" : 0,
      "rotation" : [ 0.0, 0.0, 0.0, 1.0 ]
    }
  ],

  "meshes" : [
    {
      "primitives" : [ {
        "attributes" : {
          "POSITION" : 1
        },
        "indices" : 0
      } ]
    }
  ],

  "animations": [
    {
      "samplers" : [
        {
          "input" : 2,
          "interpolation" : "LINEAR",
          "output" : 3
        }
      ],
      "channels" : [ {
        "sampler" : 0,
        "target" : {
          "node" : 0,
          "path" : "rotation"
        }
      } ]
    }
  ],

  "buffers" : [
    {
      "uri" : "data:application/octet‐
stream;base64,AAABAAIAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAACAPwAAAAA=",
      "byteLength" : 44
    },
    {

      "uri" : "data:application/octet‐
stream;base64,AAAAAAAAgD4AAAA/AABAPwAAgD8AAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAD0/TQ/9P00PwAAAAAAAAAAAAC
APwAAAAAAAAAAAAAAAPT9ND/0/TS/AAAAAAAAAAAAAAAAAACAPw==",
      "byteLength" : 100
    }
  ],
  "bufferViews" : [
    {
      "buffer" : 0,
      "byteOffset" : 0,
      "byteLength" : 6,
      "target" : 34963
    },
    {
      "buffer" : 0,
      "byteOffset" : 8,
      "byteLength" : 36,
      "target" : 34962
    },
    {
      "buffer" : 1,
      "byteOffset" : 0,
      "byteLength" : 100
    }
  ],
  "accessors" : [
    {
      "bufferView" : 0,
      "byteOffset" : 0,
      "componentType" : 5123,
      "count" : 3,
      "type" : "SCALAR",
      "max" : [ 2 ],
      "min" : [ 0 ]
    },
    {
      "bufferView" : 1,
      "byteOffset" : 0,
      "componentType" : 5126,
      "count" : 3,
      "type" : "VEC3",
      "max" : [ 1.0, 1.0, 0.0 ],
      "min" : [ 0.0, 0.0, 0.0 ]
    },
    {
      "bufferView" : 2,
      "byteOffset" : 0,
      "componentType" : 5126,
      "count" : 5,
      "type" : "SCALAR",
      "max" : [ 1.0 ],
      "min" : [ 0.0 ]
    },

    {
      "bufferView" : 2,
      "byteOffset" : 20,
      "componentType" : 5126,
      "count" : 5,
      "type" : "VEC4",
      "max" : [ 0.0, 0.0, 1.0, 1.0 ],
      "min" : [ 0.0, 0.0, 0.0, ‐0.707 ]
    }
  ],

  "asset" : {
    "version" : "2.0"
  }

}

Image 6a: A single, animated triangle.

The rotation property of the node

The only node in the example now has a rotation property. This is an array containing the four floating point
values of the quaternion that describes the rotation:

   "nodes" : [
    {
      "mesh" : 0,
      "rotation" : [ 0.0, 0.0, 0.0, 1.0 ]
    }
  ],

The given value is the quaternion describing a "rotation about 0 degrees," so the triangle will be shown in its initial
orientation.

The animation data
Three elements have been added to the toplevel arrays of the glTF JSON to encode the animation data:

A new buffer containing the raw animation data;

A new bufferView that refers to the buffer;
Two new accessor objects that add structural information to the animation data.

The buffer and the bufferView for the raw animation data

A new buffer has been added, which contains the raw animation data. This buffer also uses a data URI to
encode the 100 bytes that the animation data consists of:

   "buffers" : [
    ...
    {
      "uri" : "data:application/octet‐
stream;base64,AAAAAAAAgD4AAAA/AABAPwAAgD8AAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAD0/TQ/9P00PwAAAAAAAAAAAAC
APwAAAAAAAAAAAAAAAPT9ND/0/TS/AAAAAAAAAAAAAAAAAACAPw==",
      "byteLength" : 100
    }
  ],

  "bufferViews" : [
    ...
    {
      "buffer" : 1,
      "byteOffset" : 0,
      "byteLength" : 100
    }
  ],

There is also a new bufferView , which here simply refers to the new buffer with index 1, which contains the

whole animation buffer data. Further structural information is added with the accessor objects described below.

Note that one could also have appended the animation data to the existing buffer that already contained the
geometry data of the triangle. In this case, the new buffer view would have referred to the buffer with index 0,
and used an appropriate byteOffset to refer to the part of the buffer that then contained the animation data.

In the example that is shown here, the animation data is added as a new buffer to keep the geometry data and
the animation data separated.

The accessor objects for the animation data

Two new accessor objects have been added, which describe how to interpret the animation data. The first
accessor describes the times of the animation key frames. There are five elements (as indicated by the count
of 5), and each one is a scalar float value (which is 20 bytes in total). The second accessor says that after the
first 20 bytes, there are five elements, each being a 4D vector with float components. These are the rotations
that correspond to the five key frames of the animation, given as quaternions.
   "accessors" : [
    ...
    {
      "bufferView" : 2,
      "byteOffset" : 0,
      "componentType" : 5126,
      "count" : 5,
      "type" : "SCALAR",
      "max" : [ 1.0 ],
      "min" : [ 0.0 ]
    },
    {
      "bufferView" : 2,
      "byteOffset" : 20,
      "componentType" : 5126,
      "count" : 5,
      "type" : "VEC4",
      "max" : [ 0.0, 0.0, 1.0, 1.0 ],
      "min" : [ 0.0, 0.0, 0.0, ‐0.707 ]
    }
  ],

The actual data that is provided by the times accessor and the rotations accessor, using the data from the buffer
in the example, is shown in this table:

times accessor rotations accessor Meaning

0.0 (0.0, 0.0, 0.0, 1.0 ) At 0.0 seconds, the triangle has a rotation of 0 degrees

|0.25| (0.0, 0.0, 0.707, 0.707)| At 0.25 seconds, it has a rotation of 90 degrees around the zaxis
|0.5| (0.0, 0.0, 1.0, 0.0)| At 0.5 seconds, it has a rotation of 180 degrees around the zaxis |
|0.75| (0.0, 0.0, 0.707, 0.707)| At 0.75 seconds, it has a rotation of 270 (= 90) degrees around the zaxis |
|1.0| (0.0, 0.0, 0.0, 1.0)| At 1.0 seconds, it has a rotation of 360 (= 0) degrees around the zaxis |

So this animation describes a rotation of 360 degrees around the zaxis that lasts 1 second.

The animation
Finally, this is the part where the actual animation is added. The toplevel animations array contains a single
animation object. It consists of two elements:

The samplers , which describe the sources of animation data;

The channels , which can be imagined as connecting a "source" of the animation data to a "target."

In the given example, there is one sampler. Each sampler defines an input and an output property. They both

refer to accessor objects. Here, these are the times accessor (with index 2) and the rotations accessor (with
index 3) that have been described above. Additionally, the sampler defines an interpolation type, which is
"LINEAR" in this example.
There is also one channel in the example. This channel refers to the only sampler (with index 0) as the source
of the animation data. The target of the animation is encoded in the channel.target object: it contains an id
that refers to the node whose property should be animated. The actual node property is named in the path . So
the channel target in the given example says that the "rotation" property of the node with index 0 should be
animated.

   "animations": [
    {
      "samplers" : [ buffer view idx
        {
          "input" : 2,
          "interpolation" : "LINEAR",
          "output" : 3
        }
      ],
      "channels" : [ {
        "sampler" : 0,
        "target" : {
          "node" : 0,
          "path" : "rotation"
        }
      } ]
    }
  ],

Combining all this information, the given animation object says the following:

During the animation, the animated values are obtained from the rotations accessor. They are interpolated
linearly, based on the current simulation time and the key frame times that are provided by the times
accessor. The interpolated values are then written into the "rotation" property of the node with index 0.

A more detailed description and actual examples for the interpolation and the computations that are involved here
can be found in the Animations section.

Previous: Buffers, BufferViews, and Accessors | Table of Contents | Next: Animations
Previous: Simple Animation | Table of Contents | Next: Simple Meshes

Animations
As shown in the Simple Animation example, an animation can be used to describe how the translation ,
rotation , or scale properties of nodes change over time.

The following is another example of an animation . This time, the animation contains two channels. One

animates the translation, and the other animates the rotation of a node:

   "animations": [
    {
      "samplers" : [
        {
          "input" : 2,
          "interpolation" : "LINEAR",
          "output" : 3
        },
        {
          "input" : 2,
          "interpolation" : "LINEAR",
          "output" : 4
        }
      ],
      "channels" : [
        {
          "sampler" : 0,
          "target" : {
            "node" : 0,
            "path" : "rotation"
          }
        },
        {
          "sampler" : 1,
          "target" : {
            "node" : 0,
            "path" : "translation"
          }
        }
      ]
    }
  ],

Animation samplers
The samplers array contains animation.sampler objects that define how the values that are provided by the
accessors have to be interpolated between the key frames, as shown in Image 7a.

Image 7a: Animation samplers.

In order to compute the value of the translation for the current animation time, the following algorithm can be
used:

Let the current animation time be given as currentTime .
Compute the next smaller and the next larger element of the times accessor:

previousTime = The largest element from the times accessor that is smaller than the currentTime

nextTime = The smallest element from the times accessor that is larger than the currentTime

Obtain the elements from the translations accessor that correspond to these times:
previousTranslation = The element from the translations accessor that corresponds to the previousTime

nextTranslation = The element from the translations accessor that corresponds to the nextTime

Compute the interpolation value. This is a value between 0.0 and 1.0 that describes the relative position of
the currentTime , between the previousTime and the nextTime :

interpolationValue = (currentTime ‐ previousTime) / (nextTime ‐ previousTime)

Use the interpolation value to compute the translation for the current time:

currentTranslation = previousTranslation + interpolationValue * (nextTranslation ‐
previousTranslation)

Example:
Imagine the currentTime is 1.2. The next smaller element from the times accessor is 0.8. The next larger
element is 1.6. So

previousTime = 0.8
nextTime = 1.6

The corresponding values from the translations accessor can be looked up:

previousTranslation = (14.0, 3.0, ‐2.0)
nextTranslation = (18.0, 1.0, 1.0)
The interpolation value can be computed:

interpolationValue = (currentTime ‐ previousTime) / (nextTime ‐ previousTime)
                   = (1.2 ‐ 0.8) / (1.6 ‐ 0.8)
                   = 0.4 / 0.8
                   = 0.5

From the interpolation value, the current translation can be computed:

currentTranslation = previousTranslation + interpolationValue * (nextTranslation ‐
previousTranslation)
                   = (14.0, 3.0, ‐2.0) + 0.5 * ( (18.0, 1.0,  1.0) ‐ (14.0, 3.0, ‐2.0) )
                   = (14.0, 3.0, ‐2.0) + 0.5 * (4.0, ‐2.0, 3.0)
                   = (16.0, 2.0, ‐0.5)

So when the current time is 1.2, then the translation of the node is (16.0, 2.0, 0.5).

Animation channels
The animations contain an array of animation.channel objects. The channels establish the connection between
the input, which is the value that is computed from the sampler, and the output, which is the animated node
property. Therefore, each channel refers to one sampler, using the index of the sampler, and contains an
animation.channel.target . The target refers to a node, using the index of the node, and contains a path
that defines the property of the node that should be animated. The value from the sampler will be written into this
property.

In the example above, there are two channels for the animation. Both refer to the same node. The path of the first
channel refers to the translation of the node, and the path of the second channel refers to the rotation of
the node. So all objects (meshes) that are attached to the node will be translated and rotated by the animation,
as shown in Image 7b.

Image 7b: Animation channels.

Interpolation
The above example only covers LINEAR interpolation. Animations in a glTF asset can use three interpolation
modes :

STEP
LINEAR
CUBICSPLINE

Step
The STEP interpolation is not really an interpolation mode, it makes objects jump from keyframe to keyframe
without any sort of interpolation. When a sampler defines a step interpolation, just apply the transformation from
the keyframe corresponding to previousTime .

Linear
Linear interpolation exactly corresponds to the above example. The general case is :

Calculate the interpolationValue :

interpolationValue = (currentTime ‐ previousTime) / (nextTime ‐ previousTime)

For scalar and vector types, use a linear interpolation (genreally called lerp in mathematics libraries). Here's a

"pseudo code" implementation for reference

Point lerp(previousPoint, nextPoint, interpolationValue)
return previousPoint + interpolationValue * (nextPoint ‐ previousPoint)
In the case of rotations expressed as quaternions, you need to perform a spherical linear intepolation ( slerp )
between the previous and next values:

    Quat slerp(previousQuat, nextQuat, interpolationValue)
        var dotProduct = dot(previousQuat, nextQuat)

        //make sure we take the shortest path in case dot Product is negative
        if(dotProduct < 0.0)
            nextQuat = ‐nextQuat
            dotProduct = ‐dotProduct

        //if the two quaternions are too close to each other, just linear interpolate between the
4D vector
        if(dotProduct > 0.9995)
            return normalize(previousQuat + interpolationValue(nextQuat ‐ previousQuat))

        //perform the spherical linear interpolation
        var theta_0 = acos(dotProduct)
        var theta = interpolationValue * theta_0
        var sin_theta = sin(theta)
        var sin_theta_0 = sin(theta_0)

        var scalePreviousQuat = cos(theta) ‐ dotproduct * sin_theta / sin_theta_0
        var scaleNextQuat = sin_theta / sin_theta_0
        return scalePreviousQuat * previousQuat + scaleNextQuat * nextQuat

This example implementation is inspired from this wikipedia article

Cubic Spline interplation
Cubic spline intepolation needs more data than just the previous and next keyframe time and values, it also need
for each keyframe a couple of tangent vectors that act to smooth out the curve around the keyframe points.

These tangent are stored in the animation channel. For each keyframe described by the animation sampler, the
animation channel contains 3 elements :

The input tangent of the keyframe
The keyframe value
The output tangent

The input and output tangents are normalized vectors that will need to be scaled by the duration of the keyframe,
we call that the deltaTime

deltaTime = nextTime ‐ previousTime

To calculate the value for currentTime , you will need to fetch from the animation channel :

The output tangent direction of previousTime keyframe

The value of previousTime keyframe
The value of nextTime keyframe
The input tangent direction of nextTime keyframe

note: the input tangent of the first keyframe and the output tangent of the last keyframe are totally ignored
To calculate the actual tangents of the keyframe, you need to multiply the direction vectors you got from the
channel by deltaTime

previousTangent = deltaTime * previousOutputTangent
nextTangent = deltaTime * nextInputTangent

The mathematical function is described in the Appenddix C of the glTF 2.0 specification.

Here's a corresponding pseudocode snippet :

    Point cubicSpline(previousPoint, previousTangent, nextPoint, nextTangent, interpolationValue)
        t = interpolationValue
        t2 = t * t
        t3 = t2 * t

        return (2 * t3 ‐ 3 * t2 + 1) * previousPoint + (t3 ‐ 2 * t2 + t) * previousTangent + (‐2 *
t3 + 3 * t2) * nextPoint + (t3 ‐ t2) * nextTangent;

Previous: Simple Animation | Table of Contents | Next: Simple Meshes
Previous: Animations | Table of Contents | Next: Meshes

Simple Meshes
A mesh represents a geometric object that appears in a scene. An example of a mesh has already been shown
in the minimal glTF file. This example had a single mesh attached to a single node , and the mesh consisted of
a single mesh.primitive that contained only a single attribute—namely, the attribute for the vertex positions.
But usually, the mesh primitives will contain more attributes. These attributes may, for example, be the vertex
normals or texture coordinates.

The following is a glTF asset that contains a simple mesh with multiple attributes, which will serve as the basis
for explaining the related concepts:
{
  "scenes" : [
    {
      "nodes" : [ 0, 1]
    }
  ],
  "nodes" : [
    {
      "mesh" : 0
    },
    {
      "mesh" : 0,
      "translation" : [ 1.0, 0.0, 0.0 ]
    }
  ],

  "meshes" : [
    {
      "primitives" : [ {
        "attributes" : {
          "POSITION" : 1,
          "NORMAL" : 2
        },
        "indices" : 0
      } ]
    }
  ],

  "buffers" : [
    {
      "uri" : "data:application/octet‐
stream;base64,AAABAAIAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAACAPwAAAAAAAAAAAAAAAAAAgD8AAAAAAAA
AAAAAgD8AAAAAAAAAAAAAgD8=",
      "byteLength" : 80
    }
  ],
  "bufferViews" : [
    {
      "buffer" : 0,
      "byteOffset" : 0,
      "byteLength" : 6,
      "target" : 34963
    },
    {
      "buffer" : 0,
      "byteOffset" : 8,
      "byteLength" : 72,
      "target" : 34962
    }
  ],
  "accessors" : [
    {

      "bufferView" : 0,
      "byteOffset" : 0,
      "componentType" : 5123,
      "count" : 3,
      "type" : "SCALAR",
      "max" : [ 2 ],
      "min" : [ 0 ]
    },
    {
      "bufferView" : 1,
      "byteOffset" : 0,
      "componentType" : 5126,
      "count" : 3,
      "type" : "VEC3",
      "max" : [ 1.0, 1.0, 0.0 ],
      "min" : [ 0.0, 0.0, 0.0 ]
    },
    {
      "bufferView" : 1,
      "byteOffset" : 36,
      "componentType" : 5126,
      "count" : 3,
      "type" : "VEC3",
      "max" : [ 0.0, 0.0, 1.0 ],
      "min" : [ 0.0, 0.0, 1.0 ]
    }
  ],

  "asset" : {
    "version" : "2.0"
  }
}

Image 8a shows the rendered glTF asset.

Image 8a: A simple mesh, attached to two nodes.

The mesh definition
The given example still contains a single mesh that has a single mesh primitive. But this mesh primitive contains
multiple attributes:

   "meshes" : [
    {
      "primitives" : [ {
        "attributes" : {
          "POSITION" : 1,
          "NORMAL" : 2
        },
        "indices" : 0
      } ]
    }
  ],

In addition to the "POSITION" attribute, it has a "NORMAL" attribute. This refers to the accessor object that

provides the vertex normals, as described in the Buffers, BufferViews, and Accessors section.

The rendered mesh instances
As can be seen in Image 8a, the mesh is rendered twice. This is accomplished by attaching the mesh to two
different nodes:
   "nodes" : [
    {
      "mesh" : 0
    },
    {
      "mesh" : 0,
      "translation" : [ 1.0, 0.0, 0.0 ]
    }
  ],

The mesh property of each node refers to the mesh that is attached to the node, using the index of the mesh.

One of the nodes has a translation that causes the attached mesh to be rendered at a different position.

The next section will explain meshes and mesh primitives in more detail.

Previous: Animations | Table of Contents | Next: Meshes
Previous: Simple Meshes | Table of Contents | Next: Materials

Meshes
The Simple Meshes example from the previous section showed a basic example of a mesh with a
mesh.primitive object that contained several attributes. This section will explain the meaning and usage of
mesh primitives, how meshes may be attached to nodes of the scene graph, and how they can be rendered with
different materials.

Mesh primitives
Each mesh contains an array of mesh.primitive objects. These mesh primitive objects are smaller parts or
building blocks of a larger object. A mesh primitive summarizes all information about how the respective part of
the object will be rendered.

Mesh primitive attributes
A mesh primitive defines the geometry data of the object using its attributes dictionary. This geometry data is
given by references to accessor objects that contain the data of vertex attributes. The details of the accessor
concept are explained in the Buffers, BufferViews, and Accessors section.

In the given example, there are two entries in the attributes dictionary. The entries refer to the

positionsAccessor and the normalsAccessor :

   "meshes" : [
    {
      "primitives" : [ {
        "attributes" : {
          "POSITION" : 1,
          "NORMAL" : 2
        },
        "indices" : 0
      } ]
    }
  ],

Together, the elements of these accessors define the attributes that belong to the individual vertices, as shown in
Image 9a.

Image 9a: Mesh primitive accessors containing the data of vertices.

Indexed and nonindexed geometry
The geometry data of a mesh.primitive may be either indexed geometry or geometry without indices. In the
given example, the mesh.primitive contains indexed geometry. This is indicated by the indices property,
which refers to the accessor with index 0, defining the data for the indices. For nonindexed geometry, this
property is omitted.

Mesh primitive mode
By default, the geometry data is assumed to describe a triangle mesh. For the case of indexed geometry, this
means that three consecutive elements of the indices accessor are assumed to contain the indices of a single
triangle. For nonindexed geometry, three elements of the vertex attribute accessors are assumed to contain the
attributes of the three vertices of a triangle.

Other rendering modes are possible: the geometry data may also describe individual points, lines, or triangle
strips. This is indicated by the mode that may be stored in the mesh primitive. Its value is a constant that
indicates how the geometry data has to be interpreted. The mode may, for example, be 0 when the geometry
consists of points, or 4 when it consists of triangles. These constants correspond to the GL constants POINTS
or TRIANGLES , respectively. See the primitive.mode specification for a list of available modes.

Mesh primitive material
The mesh primitive may also refer to the material that should be used for rendering, using the index of this
material. In the given example, no material is defined, causing the objects to be rendered with a default
material that just defines the objects to have a uniform 50% gray color. A detailed explanation of materials and
the related concepts will be given in the Materials section.

Meshes attached to nodes
In the example from the Simple Meshes section, there is a single scene , which contains two nodes, and both
nodes refer to the same mesh instance, which has the index 0:
   "scenes" : [
    {
      "nodes" : [ 0, 1]
    }
  ],
  "nodes" : [
    {
      "mesh" : 0
    },
    {
      "mesh" : 0,
      "translation" : [ 1.0, 0.0, 0.0 ]
    }
  ],

  "meshes" : [
    { ... }
  ],

The second node has a translation property. As shown in the Scenes and Nodes section, this will be used to

compute the local transform matrix of this node. In this case, the matrix will cause a translation of 1.0 along the
xaxis. The product of all local transforms of the nodes will yield the global transform. And all elements that are
attached to the nodes will be rendered with this global transform.

So in this example, the mesh will be rendered twice because it is attached to two nodes: once with the global
transform of the first node, which is the identity transform, and once with the global transform of the second
node, which is a translation of 1.0 along the xaxis.

Previous: Simple Meshes | Table of Contents | Next: Materials
Previous: Meshes | Table of Contents | Next: Simple Material

Materials
Introduction
The purpose of glTF is to define a transmission format for 3D assets. As shown in the previous sections, this
includes information about the scene structure and the geometric objects that appear in the scene. But a glTF
asset can also contain information about the appearance of the objects; that is, how these objects should be
rendered on the screen.

There are different possible representations for the properties of a material, and the shading model describes how
these properties are processed. Simple shading models, like the Phong or BlinnPhong, are directly supported by
common graphics APIs like OpenGL or WebGL. These shading models are built on a set of basic material
properties. For example, the material properties involve information about the color of diffusely reflected light
(often in the form of a texture), the color of specularly reflected light, and a shininess parameter. Many file
formats contain exactly these parameters. For example, Wavefront OBJ files are combined with MTL files that
contain this texture and color information. Renderers can read this information and render the objects accordingly.
But in order to describe more realistic materials, more sophisticated shading and material models are required.

PhysicallyBased Rendering (PBR)
To allow renderers to display objects with a realistic appearance under different lighting conditions, the shading
model has to take the physical properties of the object surface into account. There are different representations
of these physical material properties. One that is frequently used is the metallicroughnessmodel. Here, the
information about the object surface is encoded with three main parameters:

The base color, which is the "main" color of the object surface.
The metallic value. This is a parameter that describes how much the reflective behavior of the material
resembles that of a metal.
The roughness value, indicating how rough the surface is, affecting the light scattering.

The metallicroughness model is the representation that is used in glTF. Other material representations, like the
specularglossinessmodel, are supported via extensions.

The effects of different metallic and roughness values are illustrated in this image:

Image 10a: Spheres with different metallic and roughness values.

The base color, metallic, and roughness properties may be given as single values and are then applied to the
whole object. In order to assign different material properties to different parts of the object surface, these
properties may also be given in the form of textures. This makes it possible to model a wide range of realworld
materials with a realistic appearance.

Depending on the shading model, additional effects can be applied to the object surface. These are usually given
as a combination of a texture and a scaling factor:

An emissive texture describes the parts of the object surface that emit light with a certain color.
The occlusion texture can be used to simulate the effect of objects selfshadowing each other.
The normal map is a texture applied to modulate the surface normal in a way that makes it possible to
simulate finer geometric details without the cost of a higher mesh resolution.

glTF supports all of these additional properties, and defines sensible default values for the cases that these
properties are omitted.

The following sections will show how these material properties are encoded in a glTF asset, including various
examples of materials:

A Simple Material
Textures, Images, and Samplers that serve as a basis for defining material properties
A Simple Texture showing an example of how to use a texture for a material
An Advanced Material combining multiple textures to achieve a sophisticated surface appearance for the
objects

Previous: Meshes | Table of Contents | Next: Simple Material
Previous: Materials | Table of Contents | Next: Textures, Images, Samplers

A Simple Material
The examples of glTF assets that have been given in the previous sections contained a basic scene structure
and simple geometric objects. But they did not contain information about the appearance of the objects. When no
such information is given, viewers are encouraged to render the objects with a "default" material. And as shown
in the screenshot of the minimal glTF file, depending on the light conditions in the scene, this default material
causes the object to be rendered with a uniformly white or light gray color.

This section will start with an example of a very simple material and explain the effect of the different material
properties.

This is a minimal glTF asset with a simple material:
{
  "scenes" : [
    {
      "nodes" : [ 0 ]
    }
  ],

  "nodes" : [
    {
      "mesh" : 0
    }
  ],

  "meshes" : [
    {
      "primitives" : [ {
        "attributes" : {
          "POSITION" : 1
        },
        "indices" : 0,
        "material" : 0
      } ]
    }
  ],

  "buffers" : [
    {
      "uri" : "data:application/octet‐
stream;base64,AAABAAIAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAACAPwAAAAA=",
      "byteLength" : 44
    }
  ],
  "bufferViews" : [
    {
      "buffer" : 0,
      "byteOffset" : 0,
      "byteLength" : 6,
      "target" : 34963
    },
    {
      "buffer" : 0,
      "byteOffset" : 8,
      "byteLength" : 36,
      "target" : 34962
    }
  ],
  "accessors" : [
    {
      "bufferView" : 0,
      "byteOffset" : 0,
      "componentType" : 5123,
      "count" : 3,

      "type" : "SCALAR",
      "max" : [ 2 ],
      "min" : [ 0 ]
    },
    {
      "bufferView" : 1,
      "byteOffset" : 0,
      "componentType" : 5126,
      "count" : 3,
      "type" : "VEC3",
      "max" : [ 1.0, 1.0, 0.0 ],
      "min" : [ 0.0, 0.0, 0.0 ]
    }
  ],

  "materials" : [
    {
      "pbrMetallicRoughness": {
        "baseColorFactor": [ 1.000, 0.766, 0.336, 1.0 ],
        "metallicFactor": 0.5,
        "roughnessFactor": 0.1
      }
    }
  ],
  "asset" : {
    "version" : "2.0"
  }
}

When rendered, this asset will show the triangle with a new material, as shown in Image 11a.

Image 11a: A triangle with a simple material.

Material definition
A new toplevel array has been added to the glTF JSON to define this material: The materials array contains a
single element that defines the material and its properties:

   "materials" : [
    {
      "pbrMetallicRoughness": {
        "baseColorFactor": [ 1.000, 0.766, 0.336, 1.0 ],
        "metallicFactor": 0.5,
        "roughnessFactor": 0.1
      }
    }
  ],

The actual definition of the material here only consists of the pbrMetallicRoughness object, which defines the

basic properties of a material in the metallicroughnessmodel. (All other material properties will therefore have
default values, which will be explained later.) The baseColorFactor contains the red, green, blue, and alpha
components of the main color of the material here, a bright orange color. The metallicFactor of 0.5 indicates
that the material should have reflection characteristics between that of a metal and a nonmetal material. The
roughnessFactor causes the material to not be perfectly mirrorlike, but instead scatter the reflected light a bit.

Assigning the material to objects
The material is assigned to the triangle, namely to the mesh.primitive , by referring to the material using its
index:
   "meshes" : [
    {
      "primitives" : [ {
        "attributes" : {
          "POSITION" : 1
        },
        "indices" : 0,
        "material" : 0
      } ]
    }

The next section will give a short introduction to how textures are defined in a glTF asset. The use of textures will
then allow the definition of more complex and realistic materials.

Previous: Materials | Table of Contents | Next: Textures, Images, Samplers
Previous: Simple Material | Table of Contents | Next: Simple Texture

Textures, Images, and Samplers
Textures are an important aspect of giving objects a realistic appearance. They make it possible to define the
main color of the objects, as well as other characteristics that are used in the material definition in order to
precisely describe what the rendered object should look like.

A glTF asset may define multiple texture objects, which can be used as the textures of geometric objects

during rendering, and which can be used to encode different material properties. Depending on the graphics API,
there may be many features and settings that influence the process of texture mapping. Many of these details
are beyond the scope of this tutorial. There are dedicated tutorials that explain the exact meaning of all the
texture mapping parameters and settings; for example, on webglfundamentals.org, open.gl, and others. This
section will only summarize how the information about textures is encoded in a glTF asset.

There are three toplevel arrays for the definition of textures in the glTF JSON. The textures , samplers , and

images dictionaries contain texture , sampler , and image objects, respectively. The following is an excerpt
from the Simple Texture example, which will be presented in the next section:

"textures": {
  {
    "source": 0,
    "sampler": 0
  }
},
"images": {
  {
    "uri": "testTexture.png"
  }
},
"samplers": {
  {
     "magFilter": 9729,
     "minFilter": 9987,
     "wrapS": 33648,
     "wrapT": 33648
   }
},

The texture itself uses indices to refer to one sampler and one image . The most important element here is

the reference to the image . It contains a URI that links to the actual image file that will be used for the texture.
Information about how to read this image data can be found in the section about image data in images .

The next section will show how such a texture definition may be used inside a material.

Previous: Simple Material | Table of Contents | Next: Simple Texture
Previous: Textures, Images, and Samplers | Table of Contents | Next: Advanced Material

A Simple Texture
As shown in the previous sections, the material definition in a glTF asset contains different parameters for the
color of the material or the overall appearance of the material under the influence of light. These properties may
be given via single values, for example, defining the color or the roughness of the object as a whole.
Alternatively, these values may be provided via textures that are mapped on the object surface. The following is
a glTF asset that defines a material with a simple, single texture:
{
  "scenes" : [ {
    "nodes" : [ 0 ]
  } ],
  "nodes" : [ {
    "mesh" : 0
  } ],
  "meshes" : [ {
    "primitives" : [ {
      "attributes" : {
        "POSITION" : 1,
        "TEXCOORD_0" : 2
      },
      "indices" : 0,
      "material" : 0
    } ]
  } ],

  "materials" : [ {
    "pbrMetallicRoughness" : {
      "baseColorTexture" : {
        "index" : 0
      },
      "metallicFactor" : 0.0,
      "roughnessFactor" : 1.0
    }
  } ],

  "textures" : [ {
    "sampler" : 0,
    "source" : 0
  } ],
  "images" : [ {
    "uri" : "testTexture.png"
  } ],
  "samplers" : [ {
    "magFilter" : 9729,
    "minFilter" : 9987,
    "wrapS" : 33648,
    "wrapT" : 33648
  } ],

  "buffers" : [ {
    "uri" : "data:application/gltf‐
buffer;base64,AAABAAIAAQADAAIAAAAAAAAAAAAAAAAAAACAPwAAAAAAAAAAAAAAAAAAgD8AAAAAAACAPwAAgD8AAAAAAAA
AAAAAgD8AAAAAAACAPwAAgD8AAAAAAAAAAAAAAAAAAAAAAACAPwAAAAAAAAAA",
    "byteLength" : 108
  } ],
  "bufferViews" : [ {
    "buffer" : 0,
    "byteOffset" : 0,
    "byteLength" : 12,

    "target" : 34963
  }, {
    "buffer" : 0,
    "byteOffset" : 12,
    "byteLength" : 96,
    "byteStride" : 12,
    "target" : 34962
  } ],
  "accessors" : [ {
    "bufferView" : 0,
    "byteOffset" : 0,
    "componentType" : 5123,
    "count" : 6,
    "type" : "SCALAR",
    "max" : [ 3 ],
    "min" : [ 0 ]
  }, {
    "bufferView" : 1,
    "byteOffset" : 0,
    "componentType" : 5126,
    "count" : 4,
    "type" : "VEC3",
    "max" : [ 1.0, 1.0, 0.0 ],
    "min" : [ 0.0, 0.0, 0.0 ]
  }, {
    "bufferView" : 1,
    "byteOffset" : 48,
    "componentType" : 5126,
    "count" : 4,
    "type" : "VEC2",
    "max" : [ 1.0, 1.0 ],
    "min" : [ 0.0, 0.0 ]
  } ],

  "asset" : {
    "version" : "2.0"
  }
}

The actual image that the texture consists of is stored as a PNG file called "testTexture.png" (see Image

15a).

Image 15a: The image for the simple texture example.

Bringing this all together in a renderer will result in the scene rendered in Image 15b.

Image 15b: A simple texture on a unit square.

The Textured Material Definition
The material definition in this example differs from the Simple Material that was shown earlier. While the simple
material only defined a single color for the whole object, the material definition now refers to the newly added
texture:
"materials" : [ {
  "pbrMetallicRoughness" : {
    "baseColorTexture" : {
      "index" : 0
    },
    "metallicFactor" : 0.0,
    "roughnessFactor" : 1.0
  }
} ],

The baseColorTexture is the index of the texture that will be applied to the object surface. The

metallicFactor and roughnessFactor are still single values. A more complex material where these properties
are also given via textures will be shown in the next section.

In order to apply a texture to a mesh primitive, there must be information about the texture coordinates that
should be used for each vertex. The texture coordinates are only another attribute for the vertices defined in the
mesh.primitive . By default, a texture will use the texture coordinates that have the attribute name
TEXCOORD_0 . If there are multiple sets of texture coordinates, the one that should be used for one particular
texture may be selected by adding a texCoord property to the texture reference:

"baseColorTexture" : {
"index" : 0,
"texCoord": 2
},

In this case, the texture would use the texture coordinates that are contained in the attribute called TEXCOORD_2 .

Previous: Textures, Images, and Samplers | Table of Contents | Next: Advanced Material
Previous: Simple Texture | Table of Contents | Next: Simple Cameras

An Advanced Material
The Simple Texture example in the previous section showed a material for which the "base color" was defined
using a texture. But in addition to the base color, there are other properties of a material that may be defined via
textures. These properties have already been summarized in the Materials section:

The base color,
The metallic value,
The roughness of the surface,
The emissive properties,
An occlusion texture, and
A normal map.

The effects of these properties cannot properly be demonstrated with trivial textures. Therefore, they will be
shown here using one of the official Khronos PBR sample models, namely, the WaterBottle model. Image 14a
shows an overview of the textures that are involved in this model, and the final rendered object:

Image 14a: An example of a material where the surface properties are defined via textures.

Explaining the implementation of physically based rendering is beyond the scope of this tutorial. The official
Khronos WebGL PBR repository contains a reference implementation of a PBR renderer based on WebGL, and
provides implementation hints and background information. The following images mainly aim at demonstrating the
effects of the different material property textures, under different lighting conditions.
Image 14b shows the effect of the roughness texture: the main part of the bottle has a low roughness, causing it
to appear shiny, compared to the cap, which has a rough surface structure.

Image 14b: The influence of the roughness texture.

Image 14c highlights the effect of the metallic texture: the bottle reflects the light from the surrounding
environment map.

Image 14c: The influence of the metallic texture.

Image 14d shows the emissive part of the texture: regardless of the dark environment setting, the text, which is
contained in the emissive texture, is clearly visible.

Image 14d: The emissive part of the texture.

Image 14e shows the part of the bottl cap for which a normal map is defined: the text appears to be embossed
into the cap. This makes it possible to model finer geometric details on the surface, even though the model itself
only has a very coarse geometric resolution.

Image 14e: The effect of a normal map.
Together, these textures and maps allow modeling a wide range of realworld materials. Thanks to the common
underlying PBR model namely, the metallicroughness model the objects can be rendered consistently by
different renderer implementations.

Previous: Simple Texture | Table of Contents | Next: Simple Cameras
Previous: Advanced Material | Table of Contents | Next: Cameras

Simple Cameras
The previous sections showed how a basic scene structure with geometric objects is represented in a glTF
asset, and how different materials can be applied to these objects. This did not yet include information about the
view configuration that should be used for rendering the scene. This view configuration is usually described as a
virtual camera that is contained in the scene, at a certain position, and pointing in a certain direction.

The following is a simple, complete glTF asset. It is similar to the assets that have already been shown: it
defines a simple scene containing node objects and a single geometric object that is given as a mesh ,
attached to one of the nodes. But this asset additionally contains two camera objects:
{
  "scenes" : [
    {
      "nodes" : [ 0, 1, 2 ]
    }
  ],
  "nodes" : [
    {
      "rotation" : [ ‐0.383, 0.0, 0.0, 0.924 ],
      "mesh" : 0
    },
    {
      "translation" : [ 0.5, 0.5, 3.0 ],
      "camera" : 0
    },
    {
      "translation" : [ 0.5, 0.5, 3.0 ],
      "camera" : 1
    }
  ],

  "cameras" : [
    {
      "type": "perspective",
      "perspective": {
        "aspectRatio": 1.0,
        "yfov": 0.7,
        "zfar": 100,
        "znear": 0.01
      }
    },
    {
      "type": "orthographic",
      "orthographic": {
        "xmag": 1.0,
        "ymag": 1.0,
        "zfar": 100,
        "znear": 0.01
      }
    }
  ],

  "meshes" : [
    {
      "primitives" : [ {
        "attributes" : {
          "POSITION" : 1
        },
        "indices" : 0
      } ]
    }
  ],

  "buffers" : [
    {
      "uri" : "data:application/octet‐
stream;base64,AAABAAIAAQADAAIAAAAAAAAAAAAAAAAAAACAPwAAAAAAAAAAAAAAAAAAgD8AAAAAAACAPwAAgD8AAAAA",
      "byteLength" : 60
    }
  ],
  "bufferViews" : [
    {
      "buffer" : 0,
      "byteOffset" : 0,
      "byteLength" : 12,
      "target" : 34963
    },
    {
      "buffer" : 0,
      "byteOffset" : 12,
      "byteLength" : 48,
      "target" : 34962
    }
  ],
  "accessors" : [
    {
      "bufferView" : 0,
      "byteOffset" : 0,
      "componentType" : 5123,
      "count" : 6,
      "type" : "SCALAR",
      "max" : [ 3 ],
      "min" : [ 0 ]
    },
    {
      "bufferView" : 1,
      "byteOffset" : 0,
      "componentType" : 5126,
      "count" : 4,
      "type" : "VEC3",
      "max" : [ 1.0, 1.0, 0.0 ],
      "min" : [ 0.0, 0.0, 0.0 ]
    }
  ],

  "asset" : {
    "version" : "2.0"
  }
}

The geometry in this asset is a simple unit square. It is rotated by 45 degrees around the xaxis, to emphasize
the effect of the different cameras. Image 17a shows three options for rendering this asset. The first examples
use the cameras from the asset. The last example shows how the scene looks from an external, userdefined
viewpoint.

Image 17a: The effect of rendering the scene with different cameras.

Camera definitions
The new toplevel element of this glTF asset is the cameras array, which contains the camera objects:

"cameras" : [
  {
    "type": "perspective",
    "perspective": {
      "aspectRatio": 1.0,
      "yfov": 0.7,
      "zfar": 100,
      "znear": 0.01
    }
  },
  {
    "type": "orthographic",
    "orthographic": {
      "xmag": 1.0,
      "ymag": 1.0,
      "zfar": 100,
      "znear": 0.01
    }
  }
],

When a camera object has been defined, it may be attached to a node . This is accomplished by assigning the

index of the camera to the camera property of a node. In the given example, two new nodes have been added to
the scene graph, one for each camera:
"nodes" : {
  ...
  {
    "translation" : [ 0.5, 0.5, 3.0 ],
    "camera" : 0
  },
  {
    "translation" : [ 0.5, 0.5, 3.0 ],
    "camera" : 1
  }
},

The differences between perspective and orthographic cameras and their properties, the effect of attaching the
cameras to the nodes, and the management of multiple cameras will be explained in detail in the Cameras
section.

Previous: Advanced Material | Table of Contents | Next: Cameras
Previous: Simple Cameras | Table of Contents | Next: Simple Morph Target

Cameras
The example in the Simple Cameras section showed how to define perspective and orthographic cameras, and
how they can be integrated into a scene by attaching them to nodes. This section will explain the differences
between both types of cameras, and the handling of cameras in general.

Perspective and orthographic cameras
There are two kinds of cameras: Perspective cameras, where the viewing volume is a truncated pyramid (often
referred to as "viewing frustum"), and orthographic cameras, where the viewing volume is a rectangular box. The
main difference is that rendering with a perspective camera causes a proper perspective distortion, whereas
rendering with an orthographic camera causes a preservation of lengths and angles.

The example in the Simple Cameras section contains one camera of each type, a perspective camera with at
index 0, and an orthographic camera at index 1:

The type of the camera is given as a string, which can be "perspective" or "orthographic" . Depending on

this type, the camera object contains a camera.perspective object or a camera.orthographic object. These
objects contain additional parameters that define the actual viewing volume.

The camera.perspective object contains an aspectRatio property that defines the aspect ratio of the viewport.

Additionally, it contains a property called yfov , which stands for Field Of View in Ydirection. It defines the
"opening angle" of the camera and is given in radians.
The camera.orthographic object contains xmag and ymag properties. These define the magnification of the
camera in x and ydirection, and basically describe the width and height of the viewing volume.

Both camera types additionally contain znear and zfar properties, which are the coordinates of the near and

far clipping plane. For perspective cameras, the zfar value is optional. When it is missing, a special "infinite
projection matrix" will be used.

Explaining the details of cameras, viewing, and projections is beyond the scope of this tutorial. The important
point is that most graphics APIs offer methods for defining the viewing configuration that are directly based on
these parameters. In general, these parameters can be used to compute a camera matrix. The camera matrix
can be inverted to obtain the view matrix, which will later be postmultiplied with the model matrix to obtain the
modelview matrix, which is required by the renderer.

Camera orientation
A camera can be transformed to have a certain orientation and viewing direction in the scene. This is
accomplished by attaching the camera to a node . Each node may contain the index of a camera that is
attached to it. In the simple camera example, there are two nodes for the cameras. The first node refers to the
perspective camera with index 0, and the second one refers to the orthographic camera with index 1:

"nodes" : {
  ...
  {
    "translation" : [ 0.5, 0.5, 3.0 ],
    "camera" : 0
  },
  {
    "translation" : [ 0.5, 0.5, 3.0 ],
    "camera" : 1
  }
},

As shown in the Scenes and Nodes section, these nodes may have properties that define the transform matrix of
the node. The global transform of a node then defines the actual orientation of the camera in the scene. With the
option to apply arbitrary animations to the nodes, it is even possible to define camera flights.

When the global transform of the camera node is the identity matrix, then the eye point of the camera is at the
origin, and the viewing direction is along the negative zaxis. In the given example, the nodes both have a
translation about (0.5, 0.5, 3.0) , which causes the camera to be transformed accordingly: it is translated
about 0.5 in the x and y direction, to look at the center of the unit square, and about 3.0 along the zaxis, to
move it a bit away from the object.

Camera instancing and management
There may be multiple cameras defined in the JSON part of a glTF. Each camera may be referred to by multiple
nodes. Therefore, the cameras as they appear in the glTF asset are really "templates" for actual camera
instances: Whenever a node refers to one camera, a new instance of this camera is created.
There is no "default" camera for a glTF asset. Instead, the client application has to keep track of the currently
active camera. The client application may, for example, offer a dropdownmenu that allows one to select the
active camera and thus to quickly switch between predefined view configurations. With a bit more implementation
effort, the client application can also define its own camera and interaction patterns for the camera control (e.g.,
zooming with the mouse wheel). However, the logic for the navigation and interaction has to be implemented
solely by the client application in this case. Image 17a shows the result of such an implementation, where the
user may select either the active camera from the ones that are defined in the glTF asset, or an "external
camera" that may be controlled with the mouse.

Previous: Simple Cameras | Table of Contents | Next: Simple Morph Target
Previous: Cameras | Table of Contents | Next: Morph Targets

A Simple Morph Target
Starting with version 2.0, glTF supports the definition of morph targets for meshes. A morph target stores
displacements or differences for certain mesh attributes. At runtime, these differences may be added to the
original mesh, with different weights, in order to animate parts of the mesh. This is often used in character
animations, for example, to encode different facial expressions of a virtual character.

The following is a minimal example that shows a mesh with two morph targets. The new elements will be
summarized here, and the broader concept of morph targets and how they are applied at runtime will be explained
in the next section.
{
  "scenes":[
    {
      "nodes":[
        0
      ]
    }
  ],
  "nodes":[
    {
      "mesh":0
    }
  ],
  "meshes":[
    {
      "primitives":[
        {
          "attributes":{
            "POSITION":1
          },
          "targets":[
            {
              "POSITION":2
            },
            {
              "POSITION":3
            }
          ],
          "indices":0
        }
      ],
      "weights":[
        1.0,
        0.5
      ]
    }
  ],

  "animations":[
    {
      "samplers":[
        {
          "input":4,
          "interpolation":"LINEAR",
          "output":5
        }
      ],
      "channels":[
        {
          "sampler":0,
          "target":{
            "node":0,

            "path":"weights"
          }
        }
      ]
    }
  ],

  "buffers":[
    {
      "uri":"data:application/gltf‐
buffer;base64,AAABAAIAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAA/AAAAPwAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAIC/AACAPwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIA/AACAPwAAAAA=",
      "byteLength":116
    },
    {
      "uri":"data:application/gltf‐
buffer;base64,AAAAAAAAgD8AAABAAABAQAAAgEAAAAAAAAAAAAAAAAAAAIA/AACAPwAAgD8AAIA/AAAAAAAAAAAAAAAA",
      "byteLength":60
    }
  ],
  "bufferViews":[
    {
      "buffer":0,
      "byteOffset":0,
      "byteLength":6,
      "target":34963
    },
    {
      "buffer":0,
      "byteOffset":8,
      "byteLength":108,
      "byteStride":12,
      "target":34962
    },
    {
      "buffer":1,
      "byteOffset":0,
      "byteLength":20
    },
    {
      "buffer":1,
      "byteOffset":20,
      "byteLength":40
    }
  ],
  "accessors":[
    {
      "bufferView":0,
      "byteOffset":0,
      "componentType":5123,
      "count":3,
      "type":"SCALAR",
      "max":[

        2
      ],
      "min":[
        0
      ]
    },
    {
      "bufferView":1,
      "byteOffset":0,
      "componentType":5126,
      "count":3,
      "type":"VEC3",
      "max":[
        1.0,
        0.5,
        0.0
      ],
      "min":[
        0.0,
        0.0,
        0.0
      ]
    },
    {
      "bufferView":1,
      "byteOffset":36,
      "componentType":5126,
      "count":3,
      "type":"VEC3",
      "max":[
        0.0,
        1.0,
        0.0
      ],
      "min":[
        ‐1.0,
        0.0,
        0.0
      ]
    },
    {
      "bufferView":1,
      "byteOffset":72,
      "componentType":5126,
      "count":3,
      "type":"VEC3",
      "max":[
        1.0,
        1.0,
        0.0
      ],
      "min":[
        0.0,

        0.0,
        0.0
      ]
    },
    {
      "bufferView":2,
      "byteOffset":0,
      "componentType":5126,
      "count":5,
      "type":"SCALAR",
      "max":[
        4.0
      ],
      "min":[
        0.0
      ]
    },
    {
      "bufferView":3,
      "byteOffset":0,
      "componentType":5126,
      "count":10,
      "type":"SCALAR",
      "max":[
        1.0
      ],
      "min":[
        0.0
      ]
    }
  ],

  "asset":{
    "version":"2.0"
  }
}

The asset contains an animation that interpolates between the different morph targets for a single triangle. A
screenshot of this asset is shown in Image 21a.

Image 21a: A triangle with two morph targets.

Most of the elements of this asset have already been explained in the previous sections: It contains a scene
with a single node and a single mesh . There are two buffer objects, one storing the geometry data and one
storing the data for the animation , and several bufferView and accessor objects that provide access to this
data.

The new elements that have been added in order to define the morph targets are contained in the mesh and the

animation :
   "meshes":[
    {
      "primitives":[
        {
          "attributes":{
            "POSITION":1
          },
          "targets":[
            {
              "POSITION":2
            },
            {
              "POSITION":3
            }
          ],
          "indices":0
        }
      ],
      "weights":[
        0.5,
        0.5
      ]
    }
  ],

The mesh.primitive contains an array of morph targets . Each morph target is a dictionary that maps attribute

names to accessor objects. In the example, there are two morph targets, both mapping the "POSITION"
attribute to accessors that contain the morphed vertex positions. The mesh also contains an array of weights
that defines the contribution of each morph target to the final, rendered mesh. These weights are also the
channel.target of the animation that is contained in the asset:

   "animations":[
    {
      "samplers":[
        {
          "input":4,
          "interpolation":"LINEAR",
          "output":5
        }
      ],
      "channels":[
        {
          "sampler":0,
          "target":{
            "node":0,
            "path":"weights"
          }
        }
      ]
    }
  ],
This means that the animation will modify the weights of the mesh that is referred to by the target.node . The
result of applying the animation to these weights, and the computation of the final, rendered mesh will be
explained in more detail in the next section about Morph Targets.

Previous: Cameras | Table of Contents | Next: Morph Targets
Previous: Simple Morph Target | Table of Contents | Next: SimpleSkin

Morph Targets
The example in the previous section contains a mesh that consists of a single triangle with two morph targets:

{
  "meshes":[
    {
      "primitives":[
        {
          "attributes":{
            "POSITION":1
          },
          "targets":[
            {
              "POSITION":2
            },
            {
              "POSITION":3
            }
          ],
          "indices":0
        }
      ],
      "weights":[
        1.0,
        0.5
      ]
    }
  ],

The actual base geometry of the mesh, namely the triangle geometry, is defined by the mesh.primitive
attribute called "POSITIONS" . The morph targets of the mesh.primitive are dictionaries that map the attribute
name "POSITIONS" to accessor objects that contain the displacements for each vertex. Image 22a shows the
initial triangle geometry in black, and the displacement for the first morph target in red, and the displacement for
the second morph target in green.

Image 22a: The initial triangle and morph target displacements.

The weights of the mesh determine how these morph target displacements are added to the initial geometry in

order to obtain the current state of the geometry. The pseudocode for computing the rendered vertex positions for
a mesh primitive is as follows:

renderedPrimitive.POSITION = primitive.POSITION +
  weights[0] * primitive.targets[0].POSITION +
  weights[1] * primitive.targets[1].POSITION;

This means that the current state of the mesh primitive is computed by taking the initial mesh primitive geometry
and adding a linear combination of the morph target displacements, where the weights are the factors for the
linear combination.

The asset additionally contains an animation that affects the weights for the morph targets. The following table

shows the key frames of the animated weights:
Time Weights

0.0 0.0, 0.0

1.0 0.0, 1.0

2.0 1.0, 1.0

3.0 1.0, 0.0

4.0 0.0, 0.0

Throughout the animation, the weights are interpolated linearly, and applied to the morph target displacements. At
each point, the rendered state of the mesh primitive is updated accordingly. The following is an example of the
state that is computed at 1.25 seconds.

Image 22b: An intermediate state of the morph target animation.

Previous: Simple Morph Target | Table of Contents | Next: SimpleSkin
Previous: Morph Targets | Table of Contents | Next: Skins

A Simple Skin
glTF supports vertex skinning, which allows the geometry (vertices) of a mesh to be deformed based on the pose
of a skeleton. This is essential in order to give animated geometry, for example of virtual characters, a realistic
appearance. The core for the definition of vertex skinning in a glTF asset is the skin , but vertex skinning in
general implies several interdependencies between the elements of a glTF asset that have been presented so far.

The following is a glTF asset that shows basic vertex skinning for a simple geometry. The elements of this asset
will be summarized quickly in this section, referring to the previous sections where appropriate, and pointing out
the new elements that have been added for the vertex skinning functionality. The details and background
information for vertex skinning will be given in the next section.
{
  "scenes" : [ {
    "nodes" : [ 0 ]
  } ],

  "nodes" : [ {
    "skin" : 0,
    "mesh" : 0,
    "children" : [ 1 ]
  }, {
    "children" : [ 2 ],
    "translation" : [ 0.0, 1.0, 0.0 ]
  }, {
    "rotation" : [ 0.0, 0.0, 0.0, 1.0 ]
  } ],

  "meshes" : [ {
    "primitives" : [ {
      "attributes" : {
        "POSITION" : 1,
        "JOINTS_0" : 2,
        "WEIGHTS_0" : 3
      },
      "indices" : 0
    } ]
  } ],

  "skins" : [ {
    "inverseBindMatrices" : 4,
    "joints" : [ 1, 2 ]
  } ],

  "animations" : [ {
    "channels" : [ {
      "sampler" : 0,
      "target" : {
        "node" : 2,
        "path" : "rotation"
      }
    } ],
    "samplers" : [ {
      "input" : 5,
      "interpolation" : "LINEAR",
      "output" : 6
    } ]
  } ],

  "buffers" : [ {
    "uri" : "data:application/gltf‐
buffer;base64,AAABAAMAAAADAAIAAgADAAUAAgAFAAQABAAFAAcABAAHAAYABgAHAAkABgAJAAgAAAAAAAAAAAAAAAAAAAC
APwAAAAAAAAAAAAAAAAAAAD8AAAAAAACAPwAAAD8AAAAAAAAAAAAAgD8AAAAAAACAPwAAgD8AAAAAAAAAAAAAwD8AAAAAAACA
PwAAwD8AAAAAAAAAAAAAAEAAAAAAAACAPwAAAEAAAAAA",

    "byteLength" : 168
  }, {
    "uri" : "data:application/gltf‐
buffer;base64,AAABAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAA
AAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAA
AAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAgD8AAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAABAPwAAgD4AAAAAAAAAA
AAAQD8AAIA+AAAAAAAAAAAAAAA/AAAAPwAAAAAAAAAAAAAAPwAAAD8AAAAAAAAAAAAAgD4AAEA/AAAAAAAAAAAAAIA+AABAPw
AAAAAAAAAAAAAAAAAAgD8AAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAA=",
    "byteLength" : 320
  }, {
    "uri" : "data:application/gltf‐
buffer;base64,AACAPwAAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAAAAAAAAgD8AAAAAAAAAvwAAgL8AAAAAAAC
APwAAgD8AAAAAAAAAAAAAAAAAAAAAAACAPwAAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAL8AAIC/AAAAAAAAgD8=",
    "byteLength" : 128
  }, {
    "uri" : "data:application/gltf‐
buffer;base64,AAAAAAAAAD8AAIA/AADAPwAAAEAAACBAAABAQAAAYEAAAIBAAACQQAAAoEAAALBAAAAAAAAAAAAAAAAAAAC
APwAAAAAAAAAAkxjEPkSLbD8AAAAAAAAAAPT9ND/0/TQ/AAAAAAAAAAD0/TQ/9P00PwAAAAAAAAAAkxjEPkSLbD8AAAAAAAAA
AAAAAAAAAIA/AAAAAAAAAAAAAAAAAACAPwAAAAAAAAAAkxjEvkSLbD8AAAAAAAAAAPT9NL/0/TQ/AAAAAAAAAAD0/TS/9P00P
wAAAAAAAAAAkxjEvkSLbD8AAAAAAAAAAAAAAAAAAIA/",
    "byteLength" : 240
  } ],

  "bufferViews" : [ {
    "buffer" : 0,
    "byteOffset" : 0,
    "byteLength" : 48,
    "target" : 34963
  }, {
    "buffer" : 0,
    "byteOffset" : 48,
    "byteLength" : 120,
    "target" : 34962
  }, {
    "buffer" : 1,
    "byteOffset" : 0,
    "byteLength" : 320,
    "byteStride" : 16
  }, {
    "buffer" : 2,
    "byteOffset" : 0,
    "byteLength" : 128
  }, {
    "buffer" : 3,
    "byteOffset" : 0,
    "byteLength" : 240
  } ],

  "accessors" : [ {
    "bufferView" : 0,
    "byteOffset" : 0,
    "componentType" : 5123,
    "count" : 24,

    "type" : "SCALAR",
    "max" : [ 9 ],
    "min" : [ 0 ]
  }, {
    "bufferView" : 1,
    "byteOffset" : 0,
    "componentType" : 5126,
    "count" : 10,
    "type" : "VEC3",
    "max" : [ 1.0, 2.0, 0.0 ],
    "min" : [ 0.0, 0.0, 0.0 ]
  }, {
    "bufferView" : 2,
    "byteOffset" : 0,
    "componentType" : 5123,
    "count" : 10,
    "type" : "VEC4",
    "max" : [ 0, 1, 0, 0 ],
    "min" : [ 0, 1, 0, 0 ]
  }, {
    "bufferView" : 2,
    "byteOffset" : 160,
    "componentType" : 5126,
    "count" : 10,
    "type" : "VEC4",
    "max" : [ 1.0, 1.0, 0.0, 0.0 ],
    "min" : [ 0.0, 0.0, 0.0, 0.0 ]
  }, {
    "bufferView" : 3,
    "byteOffset" : 0,
    "componentType" : 5126,
    "count" : 2,
    "type" : "MAT4",
    "max" : [ 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ‐0.5, ‐1.0, 0.0, 1.0
],
    "min" : [ 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ‐0.5, ‐1.0, 0.0, 1.0 ]
  }, {
    "bufferView" : 4,
    "byteOffset" : 0,
    "componentType" : 5126,
    "count" : 12,
    "type" : "SCALAR",
    "max" : [ 5.5 ],
    "min" : [ 0.0 ]
  }, {
    "bufferView" : 4,
    "byteOffset" : 48,
    "componentType" : 5126,
    "count" : 12,
    "type" : "VEC4",
    "max" : [ 0.0, 0.0, 0.707, 1.0 ],
    "min" : [ 0.0, 0.0, ‐0.707, 0.707 ]
  } ],

  "asset" : {
    "version" : "2.0"
  }
}

The result of rendering this asset is shown in Image 19a.

Image 19a: A scene with simple vertex skinning.

Elements of the simple skin example
The elements of the given example are briefly summarized here:

The scenes and nodes elements have been explained in the Scenes and Nodes section. For the vertex

skinning, new nodes have been added: the nodes at index 1 and 2 define a new node hierarchy for the
skeleton. These nodes can be considered the joints between the "bones" that will eventually cause the
deformation of the mesh.
The new toplevel dictionary skins contains a single skin in the given example. The properties of this skin
object will be explained later.
The concepts of animations has been explained in the Animations section. In the given example, the
animation refers to the skeleton nodes so that the effect of the vertex skinning is actually visible during the
animation.
The Meshes section already explained the contents of the meshes and mesh.primitive objects. In this
example, new mesh primitive attributes have been added, which are required for vertex skinning, namely the
"JOINTS_0" and "WEIGHTS_0" attributes.
There are several new buffers , bufferViews , and accessors . Their basic properties have been
described in the Buffers, BufferViews, and Accessors section. In the given example, they contain the
additional data required for vertex skinning.

Details about how these elements are interconnected to achieve the vertex skinning will be explained in the Skins
section.

Previous: Morph Targets | Table of Contents | Next: Skins
Previous: Simple Skin | Table of Contents

Skins
The process of vertex skinning is a bit complex. It brings together nearly all elements that are contained in a glTF
asset. This section will explain the basics of vertex skinning, based on the example in the Simple Skin section.

The geometry data
The geometry of the vertex skinning example is an indexed triangle mesh, consisting of 8 triangles and 10
vertices. They form a rectangle in the xyplane, with the lower left point at the origin (0,0,0), and the upper right
point at (1,2,0). So the positions of the vertices are

0.0, 0.0, 0.0,
1.0, 0.0, 0.0,
0.0, 0.5, 0.0,
1.0, 0.5, 0.0,
0.0, 1.0, 0.0,
1.0, 1.0, 0.0,
0.0, 1.5, 0.0,
1.0, 1.5, 0.0,
0.0, 2.0, 0.0,
1.0, 2.0, 0.0

and the indices of the triangles are

0, 1, 3,
0, 3, 2,
2, 3, 5,
2, 5, 4,
4, 5, 7,
4, 7, 6,
6, 7, 9,
6, 9, 8,

The raw data is stored in the first buffer . The indices and vertex positions are defined by the bufferView

objects at index 0 and 1, and the corresponding accessor objects at index 0 and 1 offer typed access to these
buffer views. Image 20a shows this geometry with outline rendering to better show the structure.

Image 20a: The geometry for the skinning example, with outline rendering, in its initial configuration.

This geometry data is contained in the mesh primitive of the only mesh, which is attached to the main node of
the scene. The mesh primitive contains additional attributes, namely the "JOINTS_0" and "WEIGHTS_0"
attributes. The purpose of these attributes will be explained below.

The skeleton structure
In the given example, there are two nodes that define the skeleton. They are referred to as "skeleton nodes", or
"joint nodes", because they can be imagined as the joints between the bones of the skeleton. The skin refers
to these nodes, by listing their indices in its joints property.

   "nodes" : [
   ...
   {
    "children" : [ 2 ],
    "translation" : [ 0.0, 1.0, 0.0 ]
   },
   {
    "rotation" : [ 0.0, 0.0, 0.0, 1.0 ]
   }
  ],
The first joint node has a translation property, defining a translation about 1.0 along the yaxis. The second
joint node has a rotation property that initially describes a rotation about 0 degrees (thus, no rotation at all).
This rotation will later be changed by the animation to let the skeleton bend left and right and show the effect of
the vertex skinning.

The skin
The skin is the core element of the vertex skinning. In the example, there is a single skin:

   "skins" : [
   {
    "inverseBindMatrices" : 4,
    "joints" : [ 1, 2 ]
   }
  ],

The skin contains an array called joints , which lists the indices of the nodes that define the skeleton hierarchy.

Additionally, the skin contains a reference to an accessor in the property inverseBindMatrices . This accessor
provides one matrix for each joint. Each of these matrices transforms the geometry into the space of the
respective joint. This means that each matrix is the inverse of the global transform of the respective joint, in its
initial configuration. In the given example, this inverse of the initial global transform is the same for both joint
nodes:

1.0   0.0   0.0    0.0
0.0   1.0   0.0   ‐1.0
0.0   0.0   1.0    0.0
0.0   0.0   0.0    1.0

This matrix translates the mesh about 1 along the yaxis, as shown Image 20b.

Image 20b: The transformation of the geometry with the inverse bind matrix of joint 1.

This transformation may look counterintuitive at first glance. But the goal of this transformation is to "undo" the
transformation that is done by the initial global transform of the respective joint node so that the influences of the
joint to the mesh vertices may be computed based on their actual global transform.

Vertex skinning implementation
Users of existing rendering libraries will hardly ever have to manually process the vertex skinning data contained
in a glTF asset: the actual skinning computations usually take place in the vertex shader, which is a lowlevel
implementation detail of the respective library. However, knowing how the vertex skinning data is supposed to be
processed may help to create proper, valid models with vertex skinning. So this section will give a short
summary of how the vertex skinning is applied, using some pseudocode and examples in GLSL.

The joint matrices
The vertex positions of a skinned mesh are eventually computed by the vertex shader. During these
computations, the vertex shader has to take into account the current pose of the skeleton in order to compute the
proper vertex position. This information is passed to the vertex shader as an array of matrices, namely as the
joint matrices. This is an array that is, a uniform variable that contains one 4×4 matrix for each joint of the
skeleton. In the shader, these matrices are combined to compute the actual skinning matrix for each vertex:
...
uniform mat4 u_jointMat[2];

...
void main(void)
{
    mat4 skinMat =
        a_weight.x * u_jointMat[int(a_joint.x)] +
        a_weight.y * u_jointMat[int(a_joint.y)] +
        a_weight.z * u_jointMat[int(a_joint.z)] +
        a_weight.w * u_jointMat[int(a_joint.w)];
    ....
}

The joint matrix for each joint has to perform the following transformations to the vertices:

The vertices have to be prepared to be transformed with the current global transform of the joint node.
Therefore, they are transformed with the inverseBindMatrix of the joint node. This is the inverse of the
global transform of the joint node in its original state, when no animations have been applied yet.
The vertices have to be transformed with the current global transform of the joint node. Together with the
transformation from the inverseBindMatrix , this will cause the vertices to be transformed only based on
the current transform of the node, in the coordinate space of the current joint node.
The vertices have to be transformed with inverse of the global transform of the node that the mesh is
attached to, because this transform is already done using the modelviewmatrix, and thus has to be
cancelled out from the skinning computation.

So the pseudocode for computing the joint matrix of joint j may look as follows:

jointMatrix(j) =
  globalTransformOfNodeThatTheMeshIsAttachedTo^‐1 *
  globalTransformOfJointNode(j) *
  inverseBindMatrixForJoint(j);

Note: Vertex skinning in other contexts often involves a matrix that is called "Bind Shape Matrix". This matrix is
supposed to transform the geometry of the skinned mesh into the coordinate space of the joints. In glTF, this
matrix is omitted, and it is assumed that this transform is either premultiplied with the mesh data, or
postmultiplied to the inverse bind matrices.

Image 20c shows the transformations that are done to the geometry in the Simple Skin example, using the joint
matrix of joint 1. The image shows the transformation for an intermediate state of the animation, namely, when
the rotation of the joint node has already been modified by the animation, to describe a rotation about 45 degrees
around the zaxis.

Image 20c: The transformation of the geometry done for joint 1.

The last panel of Image 20c shows how the geometry would look like if it were only transformed with the joint
matrix of joint 1. This state of the geometry is never really visible: The actual geometry that is computed in the
vertex shader will combine the geometries as they are created from the different joint matrices, based on the
joints and weights that are explained below.

The skinning joints and weights
As mentioned above, the mesh primitive contains new attributes that are required for the vertex skinning.
Particularly, these are the "JOINTS_0" and the "WEIGHTS_0" attributes. Each attribute refers to an accessor
that provides one data element for each vertex of the mesh.

The "JOINTS_0" attribute refers to an accessor that contains the indices of the joints that should have an

influence on the vertex during the skinning process. For simplicity and efficiency, these indices are usually stored
as 4D vectors, limiting the number of joints that may influence a vertex to 4. In the given example, the joints
information is very simple:

Vertex 0:  0, 1, 0, 0,
Vertex 1:  0, 1, 0, 0,
Vertex 2:  0, 1, 0, 0,
Vertex 3:  0, 1, 0, 0,
Vertex 4:  0, 1, 0, 0,
Vertex 5:  0, 1, 0, 0,
Vertex 6:  0, 1, 0, 0,
Vertex 7:  0, 1, 0, 0,
Vertex 8:  0, 1, 0, 0,
Vertex 9:  0, 1, 0, 0,

This means that every vertex should be influenced by joint 0 and joint 1. (The last 2 components of each vector
are ignored here. If there were multiple joints, then one entry of this accessor could, for example, contain

3, 1, 8, 4,
meaning that the corresponding vertex should be influenced by the joints 3, 1, 8, and 4.)

The "WEIGHTS_0" attribute refers to an accessor that provides information about how strongly each joint should

influence each vertex. In the given example, the weights are as follows:

Vertex 0:  1.00,  0.00,  0.0, 0.0,
Vertex 1:  1.00,  0.00,  0.0, 0.0,
Vertex 2:  0.75,  0.25,  0.0, 0.0,
Vertex 3:  0.75,  0.25,  0.0, 0.0,
Vertex 4:  0.50,  0.50,  0.0, 0.0,
Vertex 5:  0.50,  0.50,  0.0, 0.0,
Vertex 6:  0.25,  0.75,  0.0, 0.0,
Vertex 7:  0.25,  0.75,  0.0, 0.0,
Vertex 8:  0.00,  1.00,  0.0, 0.0,
Vertex 9:  0.00,  1.00,  0.0, 0.0,

Again, the last two components of each entry are not relevant, because there are only two joints.

Combining the "JOINTS_0" and "WEIGHTS_0" attributes yields exact information about the influence that each

joint has on each vertex. For example, the given data means that vertex 6 should be influenced to 25% by joint 0
and to 75% by joint 1.

In the vertex shader, this information is used to create a linear combination of the joint matrices. This matrix is
called the skin matrix of the respective vertex. Therefore, the data of the "JOINTS_0" and "WEIGHTS_0"
attributes are passed to the shader. In this example, they are given as the a_joint and a_weight attribute
variable, respectively:

...
attribute vec4 a_joint;
attribute vec4 a_weight;

uniform mat4 u_jointMat[2];

...
void main(void)
{
    mat4 skinMat =
        a_weight.x * u_jointMat[int(a_joint.x)] +
        a_weight.y * u_jointMat[int(a_joint.y)] +
        a_weight.z * u_jointMat[int(a_joint.z)] +
        a_weight.w * u_jointMat[int(a_joint.w)];
    vec4 pos = u_modelViewMatrix * skinMat * vec4(a_position,1.0);
    gl_Position = u_projectionMatrix * pos;
}

The skin matrix is then used to transform the original position of the vertex before it is transformed with the
modelviewmatrix. The result of this transformation can be imagined as a weighted transformation of the vertices
with the respective joint matrices, as shown in Image 20d.

Image 20d: Computation of the skin matrix.

The result of applying this skin matrix to the vertices for the given example is shown in Image 20e.

Image 20e: The geometry for the skinning example, with outline rendering, during the animation.

Previous: Simple Skin | Table of Contents

Webgl Tutorial PDF
100% (1)
Webgl Tutorial PDF
134 pages
Child Pornography Online PDF
0% (1)
Child Pornography Online PDF
273 pages
Vertex 3
100% (1)
Vertex 3
338 pages
Vulkan in C++ (By Nvidia)
100% (1)
Vulkan in C++ (By Nvidia)
32 pages
Research Advances Toward Real-Time Path Tracing GTC 2022
No ratings yet
Research Advances Toward Real-Time Path Tracing GTC 2022
90 pages
Topology - Poles and Loops
100% (4)
Topology - Poles and Loops
55 pages
Jarvis Commands
No ratings yet
Jarvis Commands
6 pages
Texturing 3D Models - INTL
100% (1)
Texturing 3D Models - INTL
113 pages
Blockchain Technology Notes
100% (1)
Blockchain Technology Notes
44 pages
The Universal Treatise of Global Economic Common Sense
No ratings yet
The Universal Treatise of Global Economic Common Sense
727 pages
Maurine Saidi Attachment Report Comfy Hotel Eldoret
No ratings yet
Maurine Saidi Attachment Report Comfy Hotel Eldoret
13 pages
8 MeshLab Mesh Processing1 PDF
No ratings yet
8 MeshLab Mesh Processing1 PDF
25 pages
Multi-Core Programming Digital Edition (06!29!06)
No ratings yet
Multi-Core Programming Digital Edition (06!29!06)
362 pages
Intermec cn51 Windows Embedded Handheld 6 5 Users Manual 775642
No ratings yet
Intermec cn51 Windows Embedded Handheld 6 5 Users Manual 775642
156 pages
05 Introduction To WebGL Programming Full
No ratings yet
05 Introduction To WebGL Programming Full
103 pages
Forticlient Ems 7.4.3 Release Notes
No ratings yet
Forticlient Ems 7.4.3 Release Notes
19 pages
GLTF Spec 2.0 PDF
No ratings yet
GLTF Spec 2.0 PDF
95 pages
Physically Based Rendering Encyclopedia
50% (2)
Physically Based Rendering Encyclopedia
19 pages
HTC-Access Control Operation
100% (1)
HTC-Access Control Operation
74 pages
Lighting and Rendering
No ratings yet
Lighting and Rendering
24 pages
UMG509 Brouchre
No ratings yet
UMG509 Brouchre
8 pages
Design Drawing Techniques PDF
No ratings yet
Design Drawing Techniques PDF
144 pages
MeshLab and Arc3D
No ratings yet
MeshLab and Arc3D
22 pages
Openiam TechnicalArchitecture v3 A
No ratings yet
Openiam TechnicalArchitecture v3 A
13 pages
Ite6101 Long Quiz 3
No ratings yet
Ite6101 Long Quiz 3
8 pages
Real-Time Ray Tracing - Unreal Engine Documentation
No ratings yet
Real-Time Ray Tracing - Unreal Engine Documentation
9 pages
Shaders in Unity3D
100% (1)
Shaders in Unity3D
56 pages
Ontology of The Digital Image
No ratings yet
Ontology of The Digital Image
14 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
Media Culture and Society
No ratings yet
Media Culture and Society
10 pages
Wifi Repeater Setup
No ratings yet
Wifi Repeater Setup
1 page
Chat GPT Beat Saber Copy Past
No ratings yet
Chat GPT Beat Saber Copy Past
78 pages
Pelco NET5516 Installation Manual
No ratings yet
Pelco NET5516 Installation Manual
64 pages
Blender Material Nodes
No ratings yet
Blender Material Nodes
5 pages
Gigabyte b450m Ds3h Wifi v2 Y1 Rev1.0 PDF
No ratings yet
Gigabyte b450m Ds3h Wifi v2 Y1 Rev1.0 PDF
37 pages
Webgl Tutorial
50% (2)
Webgl Tutorial
31 pages
3) MVC Architecture
No ratings yet
3) MVC Architecture
24 pages
Jitter PDF
No ratings yet
Jitter PDF
334 pages
DEF CON 32 - Packet Hacking Village - Brandon Colley - Winning The Game of Active Directory
No ratings yet
DEF CON 32 - Packet Hacking Village - Brandon Colley - Winning The Game of Active Directory
81 pages
3disciple - Magazine - December 2019 Issue - LND PDF
No ratings yet
3disciple - Magazine - December 2019 Issue - LND PDF
58 pages
Open Frameworks
No ratings yet
Open Frameworks
59 pages
(Ebook - PDF - EnG) Maya Nurbs Modeling 2 - Tutorial
No ratings yet
(Ebook - PDF - EnG) Maya Nurbs Modeling 2 - Tutorial
166 pages
DirectX Tutorial
No ratings yet
DirectX Tutorial
128 pages
An3078 stm32 Inapplication Programming Over The Ic Bus Stmicroelectronics
No ratings yet
An3078 stm32 Inapplication Programming Over The Ic Bus Stmicroelectronics
21 pages
PBR
No ratings yet
PBR
104 pages
Table of Contents - Best of Game Programming Gems
No ratings yet
Table of Contents - Best of Game Programming Gems
5 pages
AOD Texel Density 2020 HD
No ratings yet
AOD Texel Density 2020 HD
90 pages
D BRDF
No ratings yet
D BRDF
10 pages
Local Market Service FYP Proposal
No ratings yet
Local Market Service FYP Proposal
7 pages
GLTF Overview
No ratings yet
GLTF Overview
12 pages
Gmail - FWD - SSN College of Engineering - Re-Allotment of Branch - Ordered
No ratings yet
Gmail - FWD - SSN College of Engineering - Re-Allotment of Branch - Ordered
2 pages
SIGGRAPH 2015 Remedy Notes PDF
No ratings yet
SIGGRAPH 2015 Remedy Notes PDF
164 pages
Lecture4 - Guest Lecture Shaders
No ratings yet
Lecture4 - Guest Lecture Shaders
72 pages
Unity Asset Shader FlatKit Manual
No ratings yet
Unity Asset Shader FlatKit Manual
17 pages
Introduction To Physical Simulation
No ratings yet
Introduction To Physical Simulation
43 pages
Mastering Unity 2D Game Development: Chapter No. 7 " Encountering Enemies and Running Away"
No ratings yet
Mastering Unity 2D Game Development: Chapter No. 7 " Encountering Enemies and Running Away"
39 pages
Screenshot 2025-01-16 at 9.43.42 PM
No ratings yet
Screenshot 2025-01-16 at 9.43.42 PM
24 pages
GLTF (Derivative Short Form of Graphics Library Transmission Format or GL Transmission Format) Is A
No ratings yet
GLTF (Derivative Short Form of Graphics Library Transmission Format or GL Transmission Format) Is A
6 pages
VRmNet Eyesi Indirect Brochure
No ratings yet
VRmNet Eyesi Indirect Brochure
12 pages
CAPTCHA
No ratings yet
CAPTCHA
10 pages
Blender Hotkeys
No ratings yet
Blender Hotkeys
12 pages
The History of 3D Animation: By: Bashir D-K
No ratings yet
The History of 3D Animation: By: Bashir D-K
16 pages
Downloading Unreal Engineand Installing Datasmith
No ratings yet
Downloading Unreal Engineand Installing Datasmith
12 pages
Presentation 2501 AB2501 Presentation
No ratings yet
Presentation 2501 AB2501 Presentation
66 pages
Developing A Videogame Using Unreal Engine Based On A Four Stages Methodology
No ratings yet
Developing A Videogame Using Unreal Engine Based On A Four Stages Methodology
4 pages
By Penny de Byl Holistic Game Development With Unity An All in One Guide To Implementing Game Mechanics Art Design and Programming 1st Edition 10 16 11 by Penny de Byl B00htka838 PDF
No ratings yet
By Penny de Byl Holistic Game Development With Unity An All in One Guide To Implementing Game Mechanics Art Design and Programming 1st Edition 10 16 11 by Penny de Byl B00htka838 PDF
5 pages
GuiltyGearXrd's Art Style: The X Factor Between 2d and 3d
No ratings yet
GuiltyGearXrd's Art Style: The X Factor Between 2d and 3d
34 pages
Teori Dan Implementasinya Dalam Dunia Bisnis Dan Pemasaran
No ratings yet
Teori Dan Implementasinya Dalam Dunia Bisnis Dan Pemasaran
23 pages
Physically Based Rendering Encyclopedia
No ratings yet
Physically Based Rendering Encyclopedia
19 pages
Tutorial Unreal Engine 4 para Arquitetura
No ratings yet
Tutorial Unreal Engine 4 para Arquitetura
18 pages
Houdini Keyboard Shortcut List v1p1
No ratings yet
Houdini Keyboard Shortcut List v1p1
8 pages
3D Geometry Concepts For Artists
No ratings yet
3D Geometry Concepts For Artists
9 pages
Game Design Document Template
No ratings yet
Game Design Document Template
14 pages
ToonShader in Unity
No ratings yet
ToonShader in Unity
13 pages
Finite State Machine
No ratings yet
Finite State Machine
75 pages
Game Development in Unity
No ratings yet
Game Development in Unity
2 pages
Lightmap Tutorial For 3dsmax and Unity 916
No ratings yet
Lightmap Tutorial For 3dsmax and Unity 916
5 pages
MCQ of Unit 6-Servlets-1
No ratings yet
MCQ of Unit 6-Servlets-1
5 pages
Allen-Bradley PLC - Unprotected Remote Access Using RSLinx and RSLogix Software - Rev1.0
No ratings yet
Allen-Bradley PLC - Unprotected Remote Access Using RSLinx and RSLogix Software - Rev1.0
13 pages
Addon For Openframeworks, Kinect V2 and Mac - Blog - Andrew McWilliams
No ratings yet
Addon For Openframeworks, Kinect V2 and Mac - Blog - Andrew McWilliams
4 pages
Lua Game Development Cookbook - Sample Chapter
No ratings yet
Lua Game Development Cookbook - Sample Chapter
56 pages
BlackBerry 10 - Wikipedia
No ratings yet
BlackBerry 10 - Wikipedia
14 pages
Abhishek Soni Full Stack
No ratings yet
Abhishek Soni Full Stack
1 page
How The GRC Provisioning Framework Works - SAP Blogs
No ratings yet
How The GRC Provisioning Framework Works - SAP Blogs
7 pages
Things To Remember NSTP Exam
No ratings yet
Things To Remember NSTP Exam
3 pages
Design and Analysis of A Hybrid Security
No ratings yet
Design and Analysis of A Hybrid Security
5 pages
Real-Time Rendering An Introduction PDF
No ratings yet
Real-Time Rendering An Introduction PDF
12 pages
Template - User Story
No ratings yet
Template - User Story
10 pages
AI - Fixing Pathfinding Once and For All
No ratings yet
AI - Fixing Pathfinding Once and For All
36 pages
Shader Editor Doc
No ratings yet
Shader Editor Doc
13 pages

GLTF Tutorials - Wei Zhi PDF

Uploaded by

GLTF Tutorials - Wei Zhi PDF

Uploaded by

glTF Tutorial

The scene is the entry point for the description of the scene that is stored in the glTF. It refers to the

This binary data is just a raw block of memory that is read from the URI of the buffer , with no inherent meaning

The scene and nodes structure

The example consists of a single mesh, and has a single mesh.primitive object. The mesh primitive has an

The actual geometry data of the mesh primitive is given by the attributes and the indices . These both refer

The buffer , bufferView , and accessor concepts

The first accessor describes the indices of the geometry data. It refers to the bufferView with index 0, which is

As described above, a mesh.primitive may now refer to these accessors, using their indices:

When this mesh.primitive has to be rendered, the renderer can resolve the underlying buffer views and buffers

The asset description

The asset property may contain additional metadata that is described in the asset specification.

Each of the nodes that are given in the scene can be traversed, recursively visiting all their children, to process

The transform of a node can also be given using the translation , rotation , and scale properties of a node,

The translation just contains the translation in x­, y­, and z­direction. For example, from a translation of

The rotation is given as a quaternion. The mathematical background of quaternions is beyond the scope

The scale contains the scaling factors along the x­, y­, and z­axes. The corresponding matrix can be

Parts of the data of a buffer may have to be passed to the renderer as vertex attributes, or as indices, or the

The first bufferView refers to the first 6 bytes of the buffer data. The second one refers to 36 bytes of the

Each bufferView additionally contains a target property. This property may later be used by the renderer to

At this point, the buffer data has been divided into multiple parts, and each part is described by one

The componentType specifies the type of the components of these data elements. This is a GL constant that

The first accessor refers to the bufferView with index 0, which defines the part of the buffer data that

The second accessor refers to the bufferView with index 1, which defines the part of the buffer data that

The byteOffset of an accessor must be divisible by the size of its componentType .

In the example above, the byteOffset of the bufferView with index 1 (which refers to the vertex attributes)

Image 5c illustrates how the raw data of a buffer is structured using bufferView objects and is augmented

In the example, the original geometry data is stored in the bufferView with index 1. It describes a rectangular

When the transform is given by the TRS properties, an animation can be used to describe how the

The rotation property of the node

A new buffer containing the raw animation data;

The buffer and the bufferView for the raw animation data

There is also a new bufferView , which here simply refers to the new buffer with index 1, which contains the

The accessor objects for the animation data

times accessor rotations accessor Meaning

0.0 (0.0, 0.0, 0.0, 1.0 ) At 0.0 seconds, the triangle has a rotation of 0 degrees

The samplers , which describe the sources of animation data;

In the given example, there is one sampler. Each sampler defines an input and an output property. They both

The following is another example of an animation . This time, the animation contains two channels. One

previousTime = The largest element from the times accessor that is smaller than the currentTime

nextTime = The smallest element from the times accessor that is larger than the currentTime

nextTranslation = The element from the translations accessor that corresponds to the nextTime

So when the current time is 1.2, then the translation of the node is (16.0, 2.0, ­0.5).

For scalar and vector types, use a linear interpolation (genreally called lerp in mathematics libraries). Here's a

To calculate the value for currentTime , you will need to fetch from the animation channel :

The output tangent direction of previousTime keyframe

In addition to the "POSITION" attribute, it has a "NORMAL" attribute. This refers to the accessor object that

The mesh property of each node refers to the mesh that is attached to the node, using the index of the mesh.

In the given example, there are two entries in the attributes dictionary. The entries refer to the

The second node has a translation property. As shown in the Scenes and Nodes section, this will be used to

The actual definition of the material here only consists of the pbrMetallicRoughness object, which defines the

A glTF asset may define multiple texture objects, which can be used as the textures of geometric objects

There are three top­level arrays for the definition of textures in the glTF JSON. The textures , samplers , and

The texture itself uses indices to refer to one sampler and one image . The most important element here is

The actual image that the texture consists of is stored as a PNG file called "testTexture.png" (see Image

The baseColorTexture is the index of the texture that will be applied to the object surface. The

When a camera object has been defined, it may be attached to a node . This is accomplished by assigning the

The type of the camera is given as a string, which can be "perspective" or "orthographic" . Depending on

The camera.perspective object contains an aspectRatio property that defines the aspect ratio of the viewport.

Both camera types additionally contain znear and zfar properties, which are the coordinates of the near and

The new elements that have been added in order to define the morph targets are contained in the mesh and the

The mesh.primitive contains an array of morph targets . Each morph target is a dictionary that maps attribute

The weights of the mesh determine how these morph target displacements are added to the initial geometry in

The asset additionally contains an animation that affects the weights for the morph targets. The following table

The scenes and nodes elements have been explained in the Scenes and Nodes section. For the vertex

The raw data is stored in the first buffer . The indices and vertex positions are defined by the bufferView

The skin contains an array called joints , which lists the indices of the nodes that define the skeleton hierarchy.

The "JOINTS_0" attribute refers to an accessor that contains the indices of the joints that should have an

The "WEIGHTS_0" attribute refers to an accessor that provides information about how strongly each joint should

Combining the "JOINTS_0" and "WEIGHTS_0" attributes yields exact information about the influence that each

You might also like

The translation just contains the translation in x, y, and zdirection. For example, from a translation of

The scale contains the scaling factors along the x, y, and zaxes. The corresponding matrix can be

So when the current time is 1.2, then the translation of the node is (16.0, 2.0, 0.5).

There are three toplevel arrays for the definition of textures in the glTF JSON. The textures , samplers , and