Advanced Graphics Game Programming
Advanced Graphics Game Programming
URL: http://www.gamasutra.com/features/20030514/fosner_01.shtml
One of the largest problems with getting shaders into a game seems to be the learning curve
associated with shaders. Simply stated, shaders are not something that your lead graphics
programmer can implement over the weekend. There are two main issues with getting shaders
implemented in your game:
1. Understanding what shaders can do and how they replace the existing graphics pipeline.
2. Getting the supporting code implemented into your game so that you can use shaders as a
resource.
In this article we're going to continue the series of Gamasutra articles about shaders by
examining how to make shaders work. The actual integration of shader support is the stuff for a
future article. (Note: You don't need a high-end video card to try your hand at writing shaders.
All you need is the DirectX 9.0 SDK installed. With that you can select the reference device
(REF). While this software driver will be slow, it'll still give you the same results as DirectX 9
capable video card.) RenderMonkey works on any hardware that supports shaders, not just
ATI's hardware.
If you have already read Wolfgang Engel's article, Mark Kilgard's and Randy Fernando's Cg
article or you've perused the DirectX 9 SDK documentation, then you've got a fairly good idea
of the capabilities of the High-Level Shader Language (HLSL) that's supported by DirectX 9.
HLSL, Cg, and the forthcoming OpenGL shading language are all attempts to make it as easy to
write shaders as possible. You no longer have to worry (as much) about allocating registers,
using scratch variables, or learning a new form of assembly language. Instead, once you've set
up your stream data format and associated your constant input registers with more user-
friendly labels, using shaders in a program is no more difficult than using a texture.
Rather than go through the tedious setup on how to use shaders in your program, I'll refer you
to the DirectX 9 documentation. Instead I'm going to focus on a tool ATI created called
RenderMonkey. While RenderMonkey currently works on DirectX high and low-level shader
languages, ATI and 3Dlabs are working to implement support for OpenGL 2.0's shader language
in RenderMonkey that we should see in the next few months. The advantage of a tool like
RenderMonkey is that it lets you focus on writing shaders, not worrying about infrastructure. It
has a nice hierarchical structure that lets you set up a default rendering environment and make
changes at lower levels as necessary. Perhaps the biggest potential advantage of using
RenderMonkey is that the RenderMonkey files are XML files. Thus by adding a RenderMonkey
XML importer to your code or an exporter plug-in to RenderMonkey you can use RenderMonkey
files in your rendering loop to set effects for individual passes. This gives RenderMonkey an
advantage over DirectX's FX files because you can use RenderMonkey as an effects editor.
RenderMonkey even supports an "artist's mode" where only selected items in a pass are
editable.
Using HLSL
While HLSL is very C-like in its semantics, there is the challenge of relating the input and output
of the shaders with what is provided and expected by the pipeline. While shaders can have
constants set prior to their execution, when a primitive is rendered (i.e. when some form of a
DrawPrimitive call is made) then the input for each vertex shader is the vertex values
provided in the selected vertex streams. After each vertex shader call the pipeline then breaks
that vertex call into individual pixel calls and uses the (typically) interpolated values as input to
the pixel shader, which then calculates the resulting color(s) as output from the pixel shader.
This is shown in Figure 1, where the path from application space, through vertex processing
then finally to a rendered pixel is shown. The application space shows where shaders and
constants are set in blue text. The blue boxes show where vertex and pixel shaders live in the
pipeline.
Figure 1. How shaders fit into the graphics pipe
The inputs to the vertex shader function contain the things you'd expect like position, normals,
colors, etc. HLSL can also use things like blend weights and indices (used for things like
skinning), and tangents and binormals (used for various shading effects). The following tables
show the inputs and output for vertex and pixel shaders. The [n] notation indicates an optional
index.
The output of vertex shaders hasn't changed from the DirectX 8.1 days. You can have up to two
output colors, eight output texture coordinates, the transformed vertex position, and a fog and
point size value.
The output from the vertex shader is used to calculate the input for the pixel shaders. Note
there is nothing preventing you from placing any kind of data into the vertex shader's color or
texture coordinate output registers and using them for some other calculations in the pixel
shader. Just keep in mind that the output registers might be clamped and range limited,
particularly on hardware that doesn't support 2.0 shaders.
DirectX 8 pixel shaders supported only a single color register to specify the final color of a pixel.
DirectX 9 has support for multiple render targets (for example, the back buffer and a texture
surface simultaneously) and multi-element textures (typically used to generate intermediate
textures used in a later pass). However you'll need to check the CAPS bits to see what's
supported by your particular hardware. For more information, check the DirectX 9
documentation. While RenderMonkey supports rendering to a texture on one pass and reading
it in another, I'm going to keep the pixel shader simple in the following examples.
Aside from the semantics of the input and output mapping, HLSL gives you a great deal of
freedom to create shader code. In fact, HLSL looks a lot like a version of "C" written for
graphics. (Which is why NVIDIA calls their "C" like shader language Cg, as in "C-for-Graphics").
If you're familiar with C (or pretty much any procedural programming language) you can pick
up HLSL pretty quickly. What is a bit intimidating if you're not expecting it is the graphics traits
of the language itself. Not only are there the expected variable types of boolean, integer and
float, but there's also native support for vectors, matrices, and texture samplers, as well as
swizzles and masks for floats, that allow you to selectively read, write, or replicate individual
elements of vectors and matrices.
This is due to the single-instruction multiple-data (SIMD) nature of the graphics hardware. An
operation such as;
results in an element-by-element multiplication since type vector is an array of four floats. This
is the same as:
where I've used the element selection swizzle and write masks to show the individual
operations. Since the hardware is designed to operate on vectors, performing an operation on a
vector is just as expensive as performing one on a single float. A ps_1_x pixel shader can
actually perform one operation on the red-green-blue elements of a vector while simultaneously
performing a different operation on the alpha element.
In addition to graphics oriented data types there is also a collection of intrinsic functions that
are oriented to graphics, such as dot product, cross product, vector length and normalization
functions, etc. The language also supports things like multiplication of vectors by matrices and
the like. Talking about it is one thing, but it's much easier to comprehend when you have an
example of in front of you, so let's start programming.
When you first open RenderMonkey, you'll be greeted with a blank workspace. The first thing to
do is create an Effect Group. To do this, right-click on the Effect Workspace item in the
RenderMonkey Workspace view and select Add Effect Group. This will add a basic Effect Group
that will contain editable effects elements. If you have the same capabilities as the default
group (currently a RADEON 8500, GeForceFX or better) then you'll see a red teapot. If you're
running on older hardware (like a GeForce3) then you'll have to edit the pixel shader version in
the default effect from ps 1.4 to ps 1.1.
RenderMonkey automatically creates a vertex stream mapping for the positional data of the
model, places the view/projection matrix in a shader constant for you, and creates the high
level vertex and pixel shaders for you. The default vertex shader is shown below:
As you can see in the Figure 1, RenderMonkey has provided the minimal shader as the default.
The default vertex shader transforms the incoming vertex position by the view/projection
matrix while the default pixel shader (not shown) sets the outgoing pixel color to red. You can
edit the shader code in the lower window till you get the shader you want. To see what the
shader looks like, click on the Commit Changes button on the main toolbar (or press F7) to
internally save and compile the shader. If the shader has any errors, there will be an
informative message displayed in the output pane at the bottom of the RenderMonkey window.
If the shader compiled successfully, then you'll immediately see the shader results in the
preview window.
And that's about all you need to know to edit shaders in RenderMonkey! The interface is very
intuitive - just about everything can be activated or edited by double-clicking. You can insert
nodes to add textures, set render state, or add additional passes with just a few clicks. The
documentation for RenderMonkey comes with the RenderMonkey download and is also available
at http://www.ati.com/developer/sdk/radeonSDK/html/Tools/RenderMonkey.html, along with a
number of documents on using RenderMonkey.
Finally, you'll need to know some internal variables that are available to RenderMonkey, shown
in Figure 2. If you add the RenderMonkey names (case sensitive) as variables they'll be
connected to the internal RenderMonkey variables. The time-based values are vectors, but all
elements are the same value. You can use these to vary values programmatically instead of
connecting a variable to a slider.
If you've been writing low-level shader code, you probably haven't been thinking about writing
modular code. It's tough to think modularly when you don't have any support in the language
for any type of control statements. And surprisingly, there's still no actual support for modular
code. . A shader written in HLSL still compiles to a monolithic assembly shader. However, the
HLSL compiler does hide a lot of the details and does let you write like we can write a modular
shader. I mention this because it's easy to get lulled into thinking that you're working with a
mature language, not one that's less than a year old. You should be aware of these limitations.
There's no support (yet) for recursion. All functions are inlined. Function parameters are passed
by value. Statements are always evaluated entirely - there's no short-circuited evaluation as in
a C program.
Even with those limitations, it's surprisingly easy to write modular code. In Wolfgang Engel's
article, he discussed the lighting equation for computing the intensity of the light at a surface as
the contribution of the ambient light, the diffuse light and the specular light.
I've made a slight change by adding in a term for the light color and intensity, which multiplies
the contributions from the diffuse and specular terms and by using I for intensity and C for
color. Note that the color values are RGBA vectors, so there are actually four color elements
that will get computed by this equation. HLSL will automatically do the vector multiplication for
us. Wolfgang also created a HLSL shader for this basic lighting equation, so if you're new to
HLSL you might want to review what he wrote, since I'm going to build on his example.
Let's rewrite the basic shader, setting things up so that we can modularize our lighting
functions. If I add a color element to the output structure (calling it Color1), we can edit the
main function to add in the vertex normal as a parameter from the input stream and write the
output color. Insert two scalar variables, Iamb for ambient intensity and Camb for ambient color
(correspond the above equation) in the RenderMonkey workspace. This will allow us to
manipulate these variables from RenderMonkey's variable interface. RenderMonkey has a very
nice interface that supports vectors, scalars, and colors quite intuitively. To implement the
lighting equation we'll need to compute the lighting vector and the view vector, so I added
these calculations for later use. The ambient lighting values and light properties (position and
color) need to be provided to RenderMonkey by assigning them to variables. The basic vertex
shader computing the output color from the product of the ambient intensity and the ambient
color looks like this.
Note that vector is a HLSL native type for an array of four floats, it's the same as writing
float4. Also note the use of swizzles when calculating the normalized vectors - this leaves the
vector's w parameter out of the calculation. I also modified the default pixel shader to simply
pass along the color created in the vertex shader as shown below. This simple pixel shader
simply returns the (interpolated) color provided by the vertex shader.
Functions in HLSL
So let's start off by making the ambient calculation a function just to see how it's done in HLSL.
Making the ambient calculation a function is pretty simple.
The static inline attributes are optional at this point, but I've placed them there to emphasize
that currently all functions are inlined, so creating and using a function like this adds no
overhead to the shader. This Ambient() function just computes the ambient color and returns
it.
Creating the Diffuse function requires that we pass in the lighting vector and the normal vector.
In addition to the argument type description you'd expect to see in a C program, HLSL allows
you to specify if a value is strictly input, output or both through the in, out and inout
attributes. A parameter that is specified as out or inout will be copied back to the calling code,
allowing functions another way to return values. If not specified, in is assumed. Since this
diffuse equation is an implementation of what's called a Lambertian diffuse, I've named it as
such. The LambertianDiffuse() function looks like this.
Note the use of the HLSL intrinsic dot product function. The specular equation is taken from
Phong's lighting equation and requires calculation of the reflection vector. The reflection vector
is calculated from the normalized normal and light vectors.
The dot product of the reflection vector and the view vector is raised to a power that is
inversely proportional to the roughness of the surface. This is a more intuitive value than letting
a user specify a specular power value. To limit the specular contribution to only the times when
the angle between these vectors is less than 90 degrees, we limit the dot product to only
positive values. The specular color contribution becomes;
Note the use of the intrinsic saturate function to limit the range from the dot product to [0,1].
Roughness is added to the RenderMonkey Effect Workspace and added in the shader editor as a
parameter.
Using these functions we can now implement our main shader function as follows:
The three functions that we added are either placed above the main function or below, in which
case you'd need to add a function prototype. As you can see, it's fairly easy to write functional
modules in HLSL code.
The real utility of this comes when we create modules that can replace other modules. For
example, suppose that you wanted to duplicate the original functionality of the fixed-function-
pipeline, which implemented a particular type of specular called Blinn-Phong. This particular
specular lighting equation is similar to Phong's but uses something called the half-angle vector
instead of the reflection vector. An implementation of it looks like this:
To change our shader to use Blinn-Phong, all we need to do is change the function we call in
main. The color computation would look like this;
Since all of these functions are inlined, any unused code is optimized out from the shader. As
long as there's no reference to a function from main or any of the functions that are called from
main, then we can pick which implementation we want in our shader code simply by selecting
the functions we want, and we don't have to worry about unused code since it's not included in
the compiled shader.
As we get more real-time programmability it becomes easier to implement features that have
been in the artist's domain for years. Suppose your art lead creates some really cool scenes
that look great in Maya™ or 3DS Max™, but don't look right because the Lambertian diffuse in
your engine makes everything look like plastic? Why can't you just render with the same
shading options that Maya has? Well, now you can! If your artist really has to have gentler
diffuse tones provided by Oren-Nayar diffuse shading, then you can now implement it.
One of the problems of the standard Lambertian model is that it considers the reflecting surface
as a smooth diffuse surface. Surfaces that are really rough, like stone, dirt, and sandpaper
exhibit much more of a backscattering effect, particularly when the light source and the view
direction are in the same direction.
The classic example is of a full moon shown in Figure 3. If you look at the picture of the moon,
it's pretty obvious that this doesn't follow the Lambertian distribution - if it did the edges of the
moon would be in near darkness. In fact the edges look as bright as the center of the moon.
This is because the moon's surface is rough - the surface is made of a jumble of dust and rock
with diffuse reflecting surfaces at all angles - thus the quantity of reflecting surfaces is uniform
no matter the orientation of the surface, hence no matter the orientation of the surface to the
viewer the amount of light reflecting off of any point on the surface is nearly the same.
Figure 3. On rough surfaces like that of the moon, the amount of
light reflecting off of any point on the surface is nearly the same.
In an effort to better model rough surfaces, Oren and Nayar came up with a generalized version
of a Lambertian diffuse shading model that tries to account for the roughness of the surface.
They took a theoretical model and simplified it to the terms that had the most significant
impact. The Oren-Nayar diffuse shading model looks like this;
Now this may look daunting, but it can be simplified to something we can appreciate if we
replace the original notation with the notation we've already been using. p is a surface
reflectivity property, which we can replace with our surface color. E0 is a light input energy
term, which we can replace with our light color. And the 0i term is just our familiar angle
between the vertex normal and the light direction. Making these exchanges give us;
which looks a little easier to compute. There are still some parameters to explain.
Note that if the roughness value is zero, the model is the same as the Lambertian diffuse
model. Since this model gives a closer visual representation to rough surfaces such as sand,
plaster, dirt, and unglazed clay than Labertian shading, it's become a popular shading model in
most 3D graphics modeling packages. With HLSL, it's fairly easy to write your own version of an
Oren-Nayar diffuse shader. The shader code below is based upon a RenderMan shader written
by Larry Gritz. Using this function will probably make the entire shader is so long it requires
that your hardware supports 2.0 shaders or you run on the reference rasterizer.
I hope that you're getting the idea that it's pretty easy to write snippets of code for specific
purposes and place them in a library. When I was writing my book on shaders I focused more
on writing it such that I had a variety of shader subroutines rather than just a collection of
stand-alone shaders. As you can see this approach is very powerful and allows you to pick and
choose the pieces that make up the shader to customize the overall effect you want to realize.
Like C, HLSL supports the #include preprocessor directive, but only when compiling from a file
- currently RenderMonkey doesn't implement #include. The filename specified can be either an
absolute or relative path. If it's a relative path then it's assumed to be relative to the directory
of the file issuing the #include. Unlike C, there's no environmental variable support, so the
angle bracket include notation isn't supported, just the include file name in quotation marks.
It's easy to see that when function overloading gets implemented it's going to be very easy to
quickly write shader code that's easy to customize. For now you can use the preprocessor and
some #ifdef / #else / #endif directives to #define your own shading equations.
There's no reason to be stuck with the lighting equation that we've been working with. Shaders
give you the ability to create whatever shading effect you want and I encourage you to try
creating you own lighting equations, either by implementing academic models such as Oren-
Nayar, or creating your own. Cel shading is a simple example of non-photo-realistic (NPR)
rendering, but there are many, many artistic styles that are starting to show up in computer
graphics, just check out the SIGGRAPH proceedings since 1999. You can also look to the real
world for inspiration as well. There's a beautiful example of this type of shading done by ATI to
demonstrate the RADEON 9700. In order to duplicate the deep, color-shifting hues seen on
metallic paint jobs on cars, ATI created a demo that has (among other effects) a dual-specular
highlight term. This creates a highlight of one color surrounded by a highlight of a different
color as seen in a closeup of the car's side mirror in Figure 4.
The metallic flakes are from a noise map and the environment mapping finishes off the effect.
As shading hardware becomes more powerful and commonplace you'll start to see more and
more creative shading show up in games and then in mainstream applications. The next release
of the Windows OS is rumored to be designed to natively support graphics hardware
acceleration for the entire desktop, and programmable shading is going to be a big part of that.
With the prices of DirectX 9 (and OpenGL 2.0) capable hardware continually dropping, if your
current project doesn't incorporate shaders, you haven't investigated HLSL, or the low-level
shader language intimidated you, I hope this article has shown you that not only is writing HLSL
easy, but with tools like RenderMonkey you can be writing shaders within minutes.
Article Reviewers
The author would like to thank the following individuals for reviewing this article prior to
publication: Wolfgang Engel, Randy Fernando, Tadej Fius, Muhammad Haggag, Callan McInally,
Jason Mitchell, Harald Nowak, Guimo Rodriguez, and Natasha Tatarchuk.
Resources
RenderMonkey
The executable and documentation for RenderMonkey can be found at
www.ati.com/developer/sdk/radeonSDK/html/Tools/RenderMonkey.html
Cg
While not HLSL, it's pretty close. You can learn more about it at
http://developer.nvidia.com/Cg, or http://www.cgshaders.org/.
DirectX 9
The Microsoft DirectX 9 documentation is pretty sparse on HLSL, but it's there for you to puzzle
out.
Shader Books
For DirectX shaders there's ShaderX by Engel, Real-Time Shader Programming by Fosner.
There's two ShaderX2 additional books coming out soon as well. Cg is covered by The Cg
Tutorial by Fernando and Kilgard. Real-Time Shading by Olano, et. al. is more about current
shader research, but it's a useful source of information if your interested in delving further into
the state-of-the-art.
Illumination Texts
Unfortunately most graphics texts gloss over all but the simplest shading models. Most of the
older ones can be found in Computer Graphics by Foley, van Dam, et. al., with the newer ones
in Principles of Digital Image Synthesis, by Glassner. Quite a few of the original papers can be
found online as well. The RenderMan Companion by Upstill and Advanced RenderMan by
Apodaca and Gritz are really useful sources of inspiration.
Animation With Cg
By Randima Fernando and Mark J. Kilgard
Gamasutra
March 25, 2003
URL: http://www.gamasutra.com/features/20030325/fernando_01.shtml
What is Cg? The Cg language makes it possible for you to control the shape, Excerpted
appearance, and motion of objects drawn using programmable graphics From:
hardware. It marries programmatic control of these attributes with the
incredible speed and capabilities of today's graphics processors. Never
before have computer graphics practitioners, whether artists or
programmers, had so much control over the real-time images they
generate.
Cg stands for "C for graphics." The C programming language is a popular, general purpose
language invented in the 1970s. Because of its popularity and clean design, C provided the
basis for several subsequent programming languages. For example, C++ and Java base their
syntax and structure largely on C. The Cg language bases itself on C as well. If you are familiar
with C or one of the many languages derived from C, then Cg will be easy to learn.
On the other hand, if you are not familiar with C or even programming languages in general but
you enjoy computer graphics and want to learn something new, read on anyway. Cg programs
tend to be short and understandable.
Cg is different from C, C++, and Java because it is very specialized. No one will ever write a
spreadsheet or word processor in Cg. Instead, Cg targets the ability to programmatically control
the shape, appearance, and motion of objects rendered using graphics hardware. Broadly, this
type of language is called a shading language. However, Cg can do more than just shading. For
example, Cg programs can perform physical simulation, compositing, and other nonshading
tasks.
Think of a Cg program as a detailed recipe for how to render an object by using programmable
graphics hardware. For example, you can write a Cg program to make a surface appear bumpy
or to animate a virtual character. Later you will learn more about the history of shading
languages and where Cg fits into this history.
Cg's Data-Flow Model
In addition to being specialized for graphics, Cg and other shading languages are different from
conventional programming languages because they are based on a data- 3 flow computational
model. In such a model, computation occurs in response to data that flows through a sequence
of processing steps.
Cg programs operate on vertices and fragments (think "pixels" for now if you do not know what
a fragment is) that are processed when rendering an image. Think of a Cg program as a black
box into which vertices or fragments flow on one side, are somehow transformed, and then flow
out on the other side. However, the box is not really a black box because you get to determine,
by means of the Cg programs you write, exactly what happens inside.
Every time a vertex is processed or the rasterizer generates a fragment while rendering a 3D
scene, your corresponding vertex or fragment Cg program executes.
Most recent personal computers-and all recent game consoles-contain a graphics processing
unit (GPU) that is dedicated to graphics tasks such as transforming and rasterizing 3D models.
Your Cg programs actually execute within the GPU of your computer.
Whether or not a personal computer or game console has a GPU, there must be a CPU that runs
the operating system and application programs. CPUs are, by design, general purpose. CPUs
execute applications (for example, word processors and accounting packages) written in
general-purpose languages, such as C++ or Java.
Because of the GPU's specialized design, it is much faster at graphics tasks, such as rendering
3D scenes, than a general-purpose CPU would be. New GPUs process tens of millions of vertices
per second and rasterize hundreds of millions or even billions of fragments per second. Future
GPUs will be even speedier. This is overwhelmingly faster than the rate at which a CPU could
process a similar number of vertices and fragments. However, the GPU cannot execute the
same arbitrary, general-purpose programs that a CPU can.
Moreover, this graphical processing happens in addition to the considerable amount of effort
required of the CPU to update the animation for each new image. The reality is that we need
both the CPU and the GPU's specialized graphics-oriented capabilities. Both are required to
render scenes at the interactive rates and quality standards that users of 3D applications and
games demand. This means a developer can write a 3D application or game in C++ and then
use Cg to make the most of the GPU's additional graphics horsepower.
Cg enables a specialized style of parallel processing. While your CPU executes a conventional
application, that application also orchestrates the parallel processing of vertices and fragments
on the GPU, by programs written in Cg.
If a real-time shading language is such a good idea, why didn't someone invent Cg sooner? The
answer has to do with the evolution of computer graphics hardware. Prior 5 to 2001, most
computer graphics hardware-certainly the kind of inexpensive graphics hardware in PCs and
game consoles-was hard-wired to the specific tasks of vertex and fragment processing. By
"hard-wired," we mean that the algorithms were fixed within the hardware, as opposed to being
programmable in a way that is accessible to graphics applications. Even though these hard-
wired graphics algorithms could be configured by graphics applications in a variety of ways, the
applications could not reprogram the hardware to do tasks unanticipated by the designers of
the hardware. Fortunately, this situation has changed.
Graphics hardware design has advanced, and vertex and fragment processing units in recent
GPUs are truly programmable. Before the advent of programmable graphics hardware, there
was no point in providing a programming language for it. Now that such hardware is available,
there is a clear need to make it easier to program this hardware. Cg makes it much easier to
program GPUs in the same manner that C made it much easier to program CPUs.
Before Cg existed, addressing the programmable capabilities of the GPU was possible only
through low-level assembly language. The cryptic instruction syntax and manual hardware
register manipulation required by assembly languages-such as DirectX 8 vertex and pixel
shaders and some OpenGL extensions-made it a painful task for most developers. As GPU
technology made longer and more complex assembly language programs possible, the need for
a high-level language became clear. The extensive low-level programming that had been
required to achieve optimal performance could now be delegated to a compiler, which optimizes
the code output and handles tedious instruction scheduling. Figure 1-1 is a small portion of a
complex assembly language fragment program used to represent skin. Clearly, it is hard to
comprehend, particularly with the specific references to hardware registers.
In contrast, well-commented Cg code is more portable, more legible, easier to debug, and
easier to reuse. Cg gives you the advantages of a high-level language such as C while delivering
the performance of low-level assembly code.
Other Aspects of Cg
Cg is a language for programming "in the small." That makes it much simpler than a modern
general-purpose language such as C++. Because Cg specializes in transforming vertices and
fragments, it does not currently include many of the complex features required for massive
software engineering tasks. Unlike C++ and Java, Cg does not support classes and other
features used in object-oriented programming. Current Cg 6 implementations do not provide
pointers or even memory allocation (though future implementations may, and keywords are
appropriately reserved). Cg has absolutely no support for file input/output operations. By and
large, these restrictions are not permanent limitations in the language, but rather are indicative
of the capabilities of today's highest performance GPUs. As technology advances to permit more
general programmability on the GPU, you can expect Cg to grow appropriately. Because Cg is
closely based on C, future updates to Cg are likely to adopt language features from C and C++.
. . .
DEFINE LUMINANCE = {0.299, 0.587, 0.114, 0.0};
TEX H0, f[TEX0], TEX4, 2D;
TEX H1, f[TEX2], TEX5, CUBE;
DP3X H1.xyz, H1, LUMINANCE;
MULX H0.w, H0.w, LUMINANCE.w;
MULX H1.w, H1.x, H1.x;
MOVH H2, f[TEX3].wxyz;
MULX H1.w, H1.x, H1.w;
DP3X H0.xyz, H2.xzyw, H0;
MULX H0.xyz, H0, H1.w;
TEX H1, f[TEX0], TEX1, 2D;
TEX H3, f[TEX0], TEX3, 2D;
MULX H0.xyz, H0, H3;
MADX H1.w, H1.w, 0.5, 0.5;
MULX H1.xyz, H1, {0.15, 0.15, 1.0, 0.0};
MOVX H0.w, H1.w;
TEX H1, H1, TEX7, CUBE;
TEX H3, f[TEX3], TEX2, 1D;
MULX H3.w, H0.w, H2.w;
MULX H3.xyz, H3, H3.w;
. . .
Figure 1-1. A Snippet of Assembly Language Code
Cg provides arrays and structures. It has all the flow-control constructs of a modern language:
loops, conditionals, and function calls.
Cg natively supports vectors and matrices because these data types and related math
operations are fundamental to graphics and most graphics hardware directly supports vector
data types. Cg has a library of functions, called the Standard Library, that is well suited for the
kind of operations required for graphics. For example, the Cg Standard Library includes a reflect
function for computing reflection vectors. Cg programs execute in relative isolation. This means
that the processing of a particular vertex or fragment has no effect on other vertices or
fragments processed at the same time. There are no side effects to the execution of a Cg
program. This lack of interdependency among vertices and fragments makes Cg programs
extremely well suited for hardware execution by highly pipelined and parallel hardware.
When you write a program in a language designed for modern CPUs using a modern operating
system, you expect that a more-or-less arbitrary program, as long as it is correct, will compile
and execute properly. This is because CPUs, by design, execute general-purpose programs for
which the overall system has more than sufficient resources.
However, GPUs are specialized rather than general-purpose, and the feature set of GPUs is still
evolving. Not everything you can write in Cg can be compiled to execute on a given GPU. Cg
includes the concept of hardware "profiles," one of which you specify when you compile a Cg
program. Each profile corresponds to a particular combination of GPU architecture and graphics
API. Your program not only must be correct, but it also must limit itself to the restrictions
imposed by the particular profile used to compile your Cg program. For example, a given
fragment profile may limit you to no more than four texture accesses per fragment.
As GPUs evolve, additional profiles will be supported by Cg that correspond to more capable
GPU architectures. In the future, profiles will be less important as GPUs become more full-
featured. But for now Cg programmers will need to limit programs to ensure that they can
compile and execute on existing GPUs. In general, future profiles will be supersets of current
profiles, so that programs written for today's profiles will compile without change using future
profiles.
This situation may sound limiting, but in practice the Cg programs shown in this book work on
tens of millions of GPUs and produce compelling rendering effects. Another reason for limiting
program size and scope is that the smaller and more efficient your Cg programs are, the faster
they will run. Real-time graphics is often about balancing increased scene complexity,
animation rates, and improved shading. So it's always good to maximize rendering efficiency
through judicious Cg programming.
Keep in mind that the restrictions imposed by profiles are really limitations of current GPUs, not
Cg. The Cg language is powerful enough to express shading techniques that are not yet possible
with all GPUs. With time, GPU functionality will evolve far enough that Cg profiles will be able to
run amazingly complex Cg programs. Cg is a language for both current and future GPUs.
To put Cg into its proper context, you need to understand how GPUs render images. This
section explains how graphics hardware is evolving and then explores the modern graphics
hardware-rendering pipeline.
Computer graphics hardware is advancing at incredible rates. Three forces are driving this rapid
pace of innovation, as shown in Figure 1-2. First, the semiconductor industry has committed
itself to doubling the number of transistors (the basic unit of computer hardware) that fit on a
microchip every 18 months. This constant redoubling of computer power, historically known as
Moore's Law, means cheaper and faster computer hardware, and is the norm for our age.
The second force is the vast amount of computation required to simulate the world around us.
Our eyes consume and our brains comprehend images of our 3D world at an astounding rate
and with startling acuity. We are unlikely ever to reach a point where computer graphics
becomes a substitute for reality. Reality is just too real. Undaunted, computer graphics
practitioners continue to rise to the challenge. Fortunately, generating images is an
embarrassingly parallel problem. What we mean by "embarrassingly parallel" is that graphics
hardware designers can repeatedly split up the problem of creating realistic images into more
chunks of work that are smaller and easier to tackle. Then hardware engineers can arrange, in
parallel, the ever-greater number of transistors available to execute all these various chunks of
work.
Our third force is the sustained desire we all have to be stimulated and entertained visually.
This is the force that "connects" the source of our continued redoubling of computer hardware
resources to the task of approximating visual reality ever more realistically than before.
As Figure 1-2 illustrates, these insights let us confidently predict that computer graphics
hardware is going to get much faster. These innovations whet our collective appetite for more
interactive and compelling 3D experiences. Satisfying this demand is what motivated the
development of the Cg language.
Animation
Movement in Time
Animation is the result of an action that happens over time-for example, an object that
pulsates, a light that fades, or a character that runs. Your application can create these types of
animation using vertex programs written in Cg. The source of the animation is one or more
program parameters that vary with the passing of time in your application.
To create animated rendering, your application must keep track of time at a level above Cg and
even above OpenGL or Direct3D. Applications typically represent time with a global variable
that is regularly incremented as your application's sense of time advances. Applications then
update other variables as a function of time.
You could compute animation updates on the CPU and pass the animated data to the GPU.
However, a more efficient approach is to perform as much of the animation computation as
possible on the GPU with a vertex program, rather than require the CPU to do all the number-
crunching. Offloading animation work from the CPU can help balance the CPU and GPU
resources and free up the CPU for more involved computations, such as collision detection,
artificial intelligence, and game play.
A Pulsating Object
In this first example, you will learn how to make an object deform periodically so that it appears
to bulge. The goal is to take a time parameter as input and then modify the vertex positions of
the object geometry based on the time. More specifically, you need to displace the surface
position in the direction of the surface normal, as shown in Figure 6-1.
By varying the magnitude of the displacement over time, you create a bulging or pulsing effect.
Figure 6-2 shows renderings of this effect as it is applied to a character. The pulsating
animation takes place within a vertex program.
Example 6-1 shows the complete source code for the C6E1v_bulge vertex program, which is
intended to be used with the C2E2f_passthrough fragment program from Chapter 2. Only the
vertex position and normal are really needed for the bulging effect. However, lighting makes
the effect look more interesting, so we have included material and light information as well. A
helper function called computeLighting calculates just the diffuse and specular lighting (the
specular material is assumed to be white for simplicity).
Figure 6-1. Making an Object Bulge
{
// Compute the diffuse lighting
float3 L = normalize(lightPosition - P);
float diffuseLight = max(dot(N, L), 0);
float3 diffuseResult = Kd * lightColor * diffuseLight;
{
float displacement = scaleFactor * 0.5 *
sin(position.y * frequency * time) + 1;
float4 displacementDirection = float4(normal.x, normal.y,
normal.z, 0);
float4 newPosition = position +
displacement * displacementDirection;
oPosition = mul(modelViewProj, newPosition);
color.xyz = computeLighting(lightPosition, lightColor,
Kd, shininess, newPosition.xyz, normal, eyePosition);
color.w = 1;
}
Example 6-1. The C6E1v_bulge Vertex Program
The idea here is to calculate a quantity called displacement that moves the vertex position up or
down in the direction of the surface normal. To animate the program's effect, displacement has
to change over time. You can choose any function you like for this. For example, you could pick
something like this:
Of course, this behavior doesn't make a lot of sense, because displacement would always
increase, causing the object to get larger and larger endlessly over time. Instead, we want a
pulsating effect in which the object oscillates between bulging larger and returning to its normal
shape. The sine function provides such a smoothly oscillating behavior.
A useful property of the sine function is that its result is always between -1 and 1. In some
cases, such as in this example, you don't want any negative numbers, so you can scale and
bias the results into a more convenient range, such as from 0 to 1:
Did you know that the sin function is just as efficient as addition or multiplication in the CineFX
architecture? In fact, the cos function, which calculates the cosine function, is equally fast. Take
advantage of these features to add visual complexity to your programs without slowing down
their execution.
To allow finer control of your program, you can add a uniform parameter that controls the
frequency of the sine wave. Folding this uniform parameter, frequency, into the displacement
equation gives:
float displacement = 0.5 * (sin(frequency * time) + 1);
You may also want to control the amplitude of the bulging, so it's useful to have a uniform
parameter for that as well. Throwing that factor into the mix, here's what we get:
As it is now, this equation produces the same amount of protrusion all over the model. You
might use it to show a character catching his breath after a long chase. To do this, you would
apply the program to the character's chest. Alternatively, you could provide additional uniform
parameters to indicate how rapidly the character is breathing, so that over time, the breathing
could return to normal. These animation effects are inexpensive to implement in a game, and
they help to immerse players in the game's universe.
But what if you want the magnitude of bulging to vary at different locations on the model? To
do this, you have to add a dependency on a per-vertex varying parameter. One idea might be
to pass in scaleFactor as a varying parameter, rather than as a uniform parameter. Here we
show you an even easier way to add some variation to the pulsing, based on the vertex
position:
This code uses the y coordinate of the position to vary the bulging, but you could use a
combination of coordinates, if you prefer. It all depends on the type of effect you are after.
In our example, the displacement scales the object-space surface normal. Then, by adding the
result to the object-space vertex position, you get a displaced object-space vertex position:
The preceding example demonstrates an important point. Take another look at this line of code
from Example 6-1:
If you were to use this equation for the displacement, all the terms would be the same for each
vertex, because they all depend only on uniform parameters. This means that you would be
computing this displacement on the GPU for each vertex, when in fact you could simply
calculate the displacement on the CPU just once for the entire mesh and pass the displacement
as a uniform parameter. However, when the vertex position is part of the displacement
equation, the sine function must be evaluated for each vertex. And as you might expect, if the
value of the displacement varies for every vertex like this, such a per-vertex computation can
be performed far more efficiently on the GPU than on the CPU.
If a computed value is a constant value for an entire object, optimize your program by
precomputing that value on a per-object basis with the CPU. Then pass the precomputed value
to your Cg program as a uniform parameter. This approach is more efficient than recomputing
the value for every fragment or vertex processed.
Particle Systems
Sometimes, instead of animating vertices in a mesh, you want to treat each vertex as a small
object, or particle. A collection of particles that behave according to specific rules is known as a
particle system. This example implements a simple particle system in a vertex program. For
now, focus on how the system works; don't worry about its simplistic appearance. At the end of
this section, we will mention one easy method to enhance your particle system's appearance.
Figure 6-3 shows the particle system example progressing in time.
The example particle system behaves according to a simple vector kinematic equation from
physics. The equation gives the x, y, and z positions of each particle for any time. The basic
equation from which you will start is shown in Equation 6-1:
where:
The equation models the trajectory of a particle set in initial motion and under the influence of
gravity, but not otherwise interacting with other particles. This equation gives the position of a
particle for any value of time, assuming that you provide its initial position, initial velocity, and
constant acceleration, such as gravity.
Initial Conditions
The application must supply the initial position and initial velocity of each particle as varying
parameters. These two parameter values are known as initial conditions because they describe
the particle at the beginning of the simulation.
In this particular simulation, the acceleration due to gravity is the same for every particle.
Therefore, gravity is a uniform parameter.
To make the simulation more accurate, you could factor in effects such as drag and even spin-
we leave that as an exercise for you.
Vectorized Computations
Modern GPUs have powerful vector-processing capabilities, particularly for addition and
multiplication; they are well suited for processing vectors with up to four components.
Therefore, it is often just as efficient to work with such vector quantities as it is to work with
scalar (single-component) quantities.
Equation 6-1 is a vector equation because the initial position, initial velocity, constant
acceleration, and computed position are all three-component vectors. By implementing the
particle system equation as a vector expression when writing the Cg vertex program, you help
the compiler translate your program to a form that executes efficiently on your GPU.
Vectorize your calculations whenever possible, to take full advantage of the GPU's powerful
vector-processing capabilities.
Table 6-1 lists the variables used by the vertex program presented in the next section. Each
variable is a parameter to the vertex program, except for the relative time (t) and final position
(pFinal), which are calculated inside the vertex program. Note that the y component of the
acceleration is negative-because gravity acts downward, in the negative y direction. The
constant 9.8 meters per second squared is the acceleration of gravity on Earth. The initial
position, initial velocity, and uniform acceleration are object-space vectors.
pointSize = -8.0 * t * t +
8.0 * t +
0.1 * pFinal.y + 1;
}
Example 6-2. The C6E2v_particle Vertex Program
Example 6-2 shows the source code for the C6E2v_particle vertex program. This program is
meant to work in conjunction with the C2E2f_passthrough fragment program.
In this program, the application keeps track of a "global time" and passes it to the vertex
program as the uniform parameter globalTime. The global time starts at zero when the
application initializes and is continuously incremented. As each particle is created, the particle's
time of creation is passed to the vertex program as the varying parameter tInitial. To find
out how long a particle has been active, you simply have to subtract tInitial from
globalTime:
Now you can plug t into Equation 6-1 to find the particle's current position:
float4 pInitial +
pFinal = vInitial * t +
0.5 * acceleration * t * t;
This position is in object space, so it needs to be transformed into clip space, as usual:
This is a simple idea, but it produces an interesting visual variation. The color increases with
time linearly. Note that colors saturate to pure white (1, 1, 1, 1). You can try your own
alternatives, such as varying the color based on the particle's position, or varying the color
based on a combination of position and time.
C6E2v_particle uses a new vertex program output semantic called PSIZE. When you render a
point to the screen, an output parameter with this semantic specifies the width (and height) of
the point in pixels. This gives your vertex program programmatic control of the point size used
by the rasterizer.
The point size of each particle varies as time passes. The particles start out small, increase in
size, and then gradually shrink. This variation adds to the fireworks-like effect. As an extra
touch, we added a slight dependence on the particles' height, so that they get a little larger on
their way up. To accomplish all this, we use the following function for the point size:
pointSize -8.0 * t * t +
= 8.0 * t +
0.1 * pFinal.y + 1;
This function is nothing special-we merely created the formula to achieve the effect that we
wanted. In other words, the formula does not have any real physical meaning, aside from
attempting to mimic the effect we had in mind.
Although the C6E2v_particle program produces interesting particle motion, the particles
themselves do not look very appealing-they are just solid-colored squares of different sizes.
However, you can improve the particle appearance by using point sprites. With point sprites,
the hardware takes each rendered point and, instead of drawing it as a single vertex, draws it
as a square made up of four vertices, as shown in Figure 6-5. Point sprites are automatically
assigned texture coordinates for each corner vertex. This allows you to alter the appearance of
the particles from a square to any texture image you want.
Figure 6-5. Converting Points to Point Sprites
By rendering the points as point sprites, you can use the assigned texture coordinates to
sample a texture that supplies the shape and appearance of each point vertex, instead of
simply rendering each point vertex as a square point. Point sprites can create the impression of
added geometric complexity without actually drawing extra triangles. Figure 6-6 shows a more
visually interesting example of a particle system, using point sprites. Both OpenGL and Direct3D
have standard interfaces for rendering point sprites.
Key-Frame Interpolation
3D games often use a sequence of key frames to represent an animated human or creature in
various poses. For example, a creature may have animation sequences for standing, running,
kneeling, ducking, attacking, and dying. Artists call each particular pose that they create for a
given 3D model a key frame.
Key-Framing Background
The term key frame comes from cartoon animation. To produce a cartoon, an artist first quickly
sketches a rough sequence of frames for animating a character. Rather than draw every frame
required for the final animation, the artist draws only the important, or "key," frames. Later,
the artist goes back and fills in the missing frames. These in-between frames are then easier to
draw, because the prior and subsequent key frames serve as before-and-after references.
Computer animators use a similar technique. A 3D artist makes a key frame for each pose of an
animated character. Even a standing character may require a sequence of key frames that show
the character shifting weight from one foot to the other. Every key frame for a model must use
the exact same number of vertices, and every key frame must share the same vertex
connectivity. A vertex used in a given key frame corresponds to the same point on the model in
every other key frame of the model. The entire animation sequence maintains this
correspondence. However, the position of a particular vertex may change from frame to frame,
due to the model's animation.
Given such a key-framed model, a game animates the model by picking two key frames and
then blending together each corresponding pair of vertex positions. The blend is a weighted
average in which the sum of the weights equals 100 percent. Figure 6-7 shows an alien
character with several key frames. The figure includes two key frames, marked A and B, to be
blended into an intermediate pose by a Cg program.
An application can use a Cg vertex program to blend the two vertices together. This blending
may include further operations to illuminate the blended character appropriately for more
realism. Usually, an application specifies a single position for each vertex, but for key-frame
blending, each vertex has two positions, which are combined with a uniform weighting factor.
Key-frame interpolation assumes that the number and order of vertices are the same in all the
key frames for a given model. This assumption ensures that the vertex program is always
blending the correct pairs of vertices. The following Cg code fragment blends key-frame
positions:
The keyFrameA and keyFrameB variables contain the (x, y, z) positions of the vertex being
processed at key frames A and B, respectively. Note that weight and (1 - weight) sum to 1.
If weight is 0.53, the Cg program adds 47 percent (1.0 - 0.53) of the position of key frame A to
53 percent of the position of key frame B. Figure 6-8 shows an example of this type of
animation.
To maintain the appearance of continuous animation, the key-frame weight increases with each
rendered frame until it reaches 1.0 (100 percent). At this point, the existing key frame B
becomes the new key frame A, the weight resets to 0, and the game selects a new key frame B
to continue the animation. Animation sequences, such as walking, may loop repeatedly over the
set of key frames that define the character's walking motion. The game can switch from a
walking animation to a running animation just by changing over to the sequence of key frames
that define the running animation.
It is up to the game engine to use the key frames to generate convincing animated motion.
Many existing 3D games use this style of key-frame animation. When an application uses a Cg
vertex program to perform the key-frame blending operations, the CPU can spend time
improving the gameplay rather than continuously blending key frames. By using a Cg vertex
program, the GPU takes over the task of key-frame blending.
Interpolation Approaches
There are many types of interpolation. Two common forms for key-frame interpolation are
linear interpolation and quadratic interpolation.
Linear Interpolation
With linear interpolation, the transition between positions happens at a constant rate. Equation
6-2 shows the definition of linear interpolation:
As f varies from 0 to 1 in this equation, the intermediate position varies between positionA and
positionB. When f is equal to 0, the intermediate position is exactly positionA, the starting
position. When f is equal to 1, the intermediate position is positionB, the ending position. Once
again, you can use Cg's lerp function to accomplish the interpolation.
Using lerp, the interpolation between two positions can be written concisely as:
Quadratic Interpolation
Linear interpolation is good for many situations, but sometimes you want the rate of transition
to change over time. For example, you might want the transition from positionA to positionB to
start out slowly and get faster as time passes. For this, you might use quadratic interpolation,
as in the following code fragment:
Other functions that you might use are step functions, spline functions, and exponential
functions. Figure 6-9 shows several common types of interpolation functions.
Example 6-3 shows the C6E3v_keyFrame vertex program. This program performs the object-
space blending of two positions, each from a different key frame. The lerp Standard Library
function linearly interpolates the two positions, and then the program transforms the blended
position into clip space. The program passes through a texture coordinate set and a color.
As indicated by the input semantics for positionA and positionB, the application is
responsible for configuring key frame A's position as the conventional position (POSITION) and
key frame B's position as texture coordinate set 1 (TEXCOORD1).
The application is also responsible for determining the key-frame blending factor via the
uniform parameter keyFrameBlend. The value of keyFrameBlend should transition from 0 to 1.
Once 1 is reached, the application chooses another key frame in the animation sequence, the
old key frame B position input is then configured as the key frame A position input, and the new
key-frame position data feeds the key frame B position input.
You often want to light a key-framed model. This involves not merely blending two positions
(the vertex in two different key frames), but also blending the two corresponding surface
normals. Then you can calculate lighting computations with the blended normal. Blending two
normals may change the length of the resulting normal, so you must normalize the blended
normal prior to lighting.
Example 6-4 shows the C6E4v_litKeyFrame vertex program that adds per-vertex
struct Light {
float3 eyePosition; // In object space
float3 lightPosition; // In object space
float4 lightColor;
float specularExponent;
float ambient;
};
{
float3 lightDirection = light.lightPosition - position;
float3 lightDirNorm = normalize(lightDirection);
float3 eyeDirection = light.eyePosition - position;
float3 eyeDirNorm = normalize(eyeDirection);
float3 halfAngle = normalize(lightDirNorm + eyeDirNorm);
float diffuse = max(0, dot(lightDirNorm, normal));
float specular = pow(max(0, dot(halfAngle, normal)),
light.specularExponent);
return light.lightColor * (light.ambient +
diffuse + specular);
}
void C6E4v_litKeyFrame(float3 positionA : POSITION,
float3 normalA : NORMAL,
float3 positionB : TEXCOORD1,
float3 normalB : TEXCOORD2,
float2 texCoord: TEXCOORD0,
lighting to the C6E3v_keyFrame example. In the updated example, each key frame also supplies
its own corresponding per-vertex surface normal.
The computeLighting internal function computes a conventional lighting model using object-
space lighting.
Vertex Skinning
One or more of these matrices control each vertex in the default pose's polygonal mesh. Each
matrix is assigned a weighting factor (from 0 to 100 percent), which indicates how much that
matrix affects each vertex. Only a small number of matrices usually control each vertex,
meaning that only these few matrices have positive and significant weighting factors for a given
vertex. We call this small set of matrices the bone set for each vertex. We assume that the
weighting factors for all the matrices in a vertex's bone set always sum to 100 percent.
When rendering this type of model, you first transform every vertex by each matrix in the
vertex's bone set, then weight the results of each matrix transform according to the matrix's
corresponding weighting factor, and finally sum the results. This new position is the skinned
vertex position.
When all the matrices are identity matrices (no rotation, no translation), the mesh is in the
default pose. 3D artists often pick a default pose in which the character is standing and facing
forward, with legs apart and arms outstretched.
By controlling the matrices, you can create novel poses. For example, a vertex on a character's
forearm close to the elbow might use 67 percent of the forearm matrix, 21 percent of the elbow
matrix, and 12 percent of the upper arm matrix. The animator who creates a model for vertex
skinning must appropriately localize each matrix so that, for example, the matrix that controls
the left shoulder has no effect on vertices near the ankle. Often, the number of matrices
affecting any given vertex is limited to no more than four. For the 3D artist, once all the
weights and matrices are assigned to the model's default pose, constructing a new pose is a
matter of manipulating the matrices appropriately, rather than attempting to position each
individual vertex. Posing and animating the model is much simpler when it is authored for
vertex skinning.
For a character model, the most significant matrices represent the way rigid bones in the
character's body move and rotate; hence, the vertex-skinning matrices are called bones. The
vertices represent points on the skin. Vertex skinning simulates how bones, represented as
matrices, tug and reposition various points, represented as vertices, on the character's skin.
Lighting
For correct lighting, you can compute the same sort of transformed and weighted average used
for positions, except that you transform normals by the inverse transpose of each matrix rather
than by the matrix itself. Weighted normals may no longer be unit length, so normalization is
required.
Assuming that the bone matrices are merely rotations and translations simplifies the
transformation of the normals for lighting, because the inverse transpose of a matrix without
scaling or projection is the matrix itself.
With the key frame approach, every pose requires a distinct set of vertex positions and
normals. This becomes unwieldy if huge numbers of poses are required.
However, with vertex skinning, each pose requires just the default pose-shared by all poses-
and the matrix values for the given pose. There are generally substantially fewer matrices per
character than vertices, so representing a pose as a set of bone matrices is more compact than
representing the pose with a key frame. With vertex skinning, you can also create novel poses
dynamically, either by blending existing bone matrices from different poses or by controlling
matrices directly. For example, if you know what matrices control an arm, you can wave the
arm by controlling those matrices.
In addition to requiring the matrices for each pose, the model's default pose needs each vertex
to have a default position, a default normal, some number of matrix indices to identify which
subset of matrices control the vertex, and the same number of weighting factors, corresponding
to each respective matrix.
This data for the default pose is constant for all other poses. Generating a new pose requires
only new matrices, not any changes to the default pose data. If the GPU can perform all the
vertex-skinning computations, this means that the CPU needs to update only the bone matrices
for each new pose, but not otherwise manipulate or access the default pose data.
Vertex skinning is quite amenable to storing and replaying motion-capture sequences. You can
represent each motion-capture frame as a set of bone matrices that you can then apply to
different models that share the same default pose and matrix associations. Inverse kinematics
solvers can also generate bone matrices procedurally. An inverse kinematics solver attempts to
find an incremental sequence of bone matrices that transition from one given pose to another
given pose in a realistic, natural manner.
The C6E5v_skin4m vertex program in Example 6-5 implements vertex skinning, assuming that
no more than four bone matrices affect each vertex (a common assumption).
An array of 24 bone matrices, each a 3x4 matrix, represents each pose. The entire array is a
uniform parameter to the program. The program assumes that each bone matrix consists of a
translation and a rotation (no scaling or projection).
The per-vertex matrixIndex input vector provides a set of four bone-matrix indices for
accessing the boneMatrix array. The per-vertex weight input vector provides the four
weighting factors for each respective bone matrix. The program assumes that the weighting
factors for each vertex sum to 100 percent.
For performance reasons, the program treats boneMatrix as an array of float4 vectors rather
than an array of float3x4 matrices. The matrixIndex array contains floating-point values
instead of integers, and so the addressing of a single array of vectors is more efficient than
accessing an array of matrices. The implication of this is that the indices in the matrixIndex
vector should be three times the actual matrix index. So, the program assumes 0 is the first
matrix in the array, 3 is the second matrix, and so on. The indices are fixed for each vertex, so
you improve performance by moving this "multiply by 3" outside the vertex program.
A for loop, looping four times, transforms the default pose position and normal by each bone
matrix. Each result is weighted and summed.
The program computes both the weighted position and normal for the pose. The same
computeLighting internal function from Example 6-4 computes per-vertex object-space
lighting with the weighted position and normal.
Although this example is rather limited, you could generalize it to handle more bone matrices,
general bone matrices (for example, allowing scaling), and matrices influencing each vertex-
and to compute a better lighting model.
Further Reading
Cg builds on a host of concepts in computer language design, computer hardware design, and
computer graphics. Doing justice to all these contributions in the context of this tutorial is not
always practical. What we attempt in the "Further Reading" section is to offer you pointers to
learn more about the contributions that underlie the topics in each chapter.
There are plenty of books on C. The C Programming Language, Third Edition (Prentice Hall,
2000), by Brian Kernighan and Dennis Ritchie, is a classic; the authors invented the C
language. Cg includes concepts from both C and C++. There now may actually be more books
about C++ than about C. The classic C++ book is The C++ Programming Language, Third
Edition (Addison-Wesley, 2000), by Bjarne Stroustrup, who invented the language.
To learn more about the RenderMan Shading Language, read The RenderMan Companion: A
Programmer's Guide to Realistic Computer Graphics (Addison-Wesley, 1989), by Steve Upstill.
Pat Hanrahan and Jim Lawson published a SIGGRAPH paper about RenderMan called "A
Language for Shading and Lighting Calculations" (ACM Press) in 1990.
Robert Cook's 1984 SIGGRAPH paper titled "Shade Trees" (ACM Press) motivated the
development of RenderMan.
The development of programmable graphics hardware and its associated languages has been an
active and fruitful research area for almost a decade. Anselmo Lastra, Steven Molnar, Marc
Olano, and Yulan Wang at UNC published an early research paper in 1995 titled "Real-Time
Programmable Shading" (ACM Press). Researchers at UNC also published several papers about
their programmable PixelFlow graphics architecture. Marc Olano and Anselmo Lastra published
a SIGGRAPH paper titled "A Shading Language on Graphics Hardware: The PixelFlow Shading
System" (ACM Press) in 1998.
Kekoa Proudfoot, Bill Mark, Svetoslav Tzvetkov, and Pat Hanrahan published a SIGGRAPH paper
in 2001 titled "A Real-Time Procedural Shading System for Programmable Graphics
Hardware" (ACM Press) that describes a GPU-oriented shading language developed at Stanford.
Real-Time Rendering, Second Edition (A. K. Peters, 2002), written by Eric Haines and Tomas
Akenine-Möller, is an excellent resource for further information about graphics hardware and
interactive techniques.
If you are interested in the physics behind the particle system you created, you can learn more
by reviewing kinematics in any high school or college physics textbook.
Jeff Lander wrote a series of articles in 1998 and 1999 for Game Developer Magazine about
various animation techniques. You can find these articles on the
http://www.gamasutra.com/features/20030325/www.darwin3d.com website. For particle
systems, read "The Ocean Spray in Your Face." For vertex skinning, check out "Skin Them
Bones: Game Programming for the Web Generation."
The original volume of Game Programming Gems (Charles River Media, 2000), edited by Mark
DeLoura, contains several gems related to key-frame animation and vertex skinning. Check out
these articles: "Interpolated 3D Keyframe Animation," by Herbert Marselas; "A Fast and Simple
Skinning Technique," by Torgeir Hagland; and "Filling the Gaps-Advanced Animation Using
Stitching and Skinning," by Ryan Woodland.
John Vince's book 3-D Computer Animation (Addison-Wesley, 1992) covers many of the
techniques described in this chapter, as well as others, such as free-form deformation (FFD).
DirectX 8 added point sprites to Direct3D. OpenGL implementations from multiple hardware
vendors support the NV_point_sprite extension. The specification for this OpenGL extension is
available at the http://www.opengl.org/ website.
URL: http://www.gamasutra.com/gdc2003/features/20030307/leonard_01.htm
The term "senses" in game development is a useful metaphor for understanding, designing, and
discussing that part of the AI that gathers information about items of interest in the simulated
environment of the game. Non-player characters visually presented as humans, animals, or
creatures with eyes and ears in a realistic three-dimensional space lend themselves well to the
metaphor.
This engineering metaphor is not best applied too literally. In spite of the seemingly physical
nature of the AIs in the game world, the analogy of game AI senses is not a physiological or
neurological one. The line between "sense" and "knowledge" in a game is a blurry one. Sense
incorporates the idea of awareness of another entity in the game, includes elements of value
and knowledge, and can have game-relevant logic wired directly in.
A game sensory system must be designed in a way that is subservient to the game design and
efficient in implementation. The senses need only be as sophisticated as is needed to be
entertaining and robust. The result of their work must be perceivable and understandable by
the player. Few game designs require AIs with a sense of taste, touch, or smell; thus senses
primarily are concerned with vision or hearing. Used wisely, senses can be an invaluable tool to
make simple state machines more interesting by providing them with a broad range of
environmental input.
This paper describes an approach to designing and implementing a high-fidelity sensory system
for a stealth-oriented first-person AI system. The techniques described are derived from
experience constructing the AI for Thief: The Dark Project, as well as familiarity with the code of
Half-Life. Initially, the basic concepts of AI senses are laid out using Half-Life as a motivating
example. The paper then examines the more stringent sensory requirements of a stealth game
design. Finally, the sensory system built for Thief is described.
Half-Life is not a game that centers on stealth and senses. With a strong tactical combat
element, however, it does require a reasonable sensory system. This makes it a perfect case to
explore the basics of AI sensory systems. AIs in Half-Life have sight and hearing, a system for
managing information about sensed entities, and present interesting examples of leveraging
basic senses into appealing behaviors.
In a simple sensory system, AIs periodically "look" at and "listen" to the world. Unlike real
vision and hearing where stimuli arrive at the senses whether desired or not, these are active
events. The AI examines the world based on its interest, and decides according to a set of rules
that it sees or hears another element in the game. These probes are designed to emulate real
senses while limiting the amount of work done. A greater amount of resources is dedicated to
the things that are important for the game mechanics.
For example, in Half-Life the core sensory logic that is run periodically is:
Begin look
--Gather a list of entities within a specified distance
--For each entity found...
----If I want to look for them and
----If they are in my viewcone and
----If I can raycast from my eyes to their eyes then...
------If they are the player and
------If I have been told to not see the player until they see me and
------If they do not see me
--------End look
------Else
--------Set various signals depending on my relationship with the seen
--------entity
End look
Begin listen
--For each sound being played...
----If the sound is carrying to my ears...
------Add the sound to a list of heard sounds
------If the sound is a real sound...
--------Set a signal indicating heard something
------If the sound is a "smell" pseudo-sound
--------Set a signal indicating smelled something
End listen
The first concept illustrated by this pseudo-code is that the senses are closely tied to the
properties of the AI, its relationship with the subject, and the relevance of the AI to the player's
experience. This is in part motivated by optimization concerns, but made available by game
mechanics. In the Half-Life game design an AI that is not near the player is not relevant and
need not sense the world. Even when near the player, the AI needs only to look at things that
are known to produce reactions of fear or hatred later.
The logic also demonstrates the basic construction of vision as a view distance, a view cone,
line-of-sight, and eye position (Figure 1). Each AI has a length-limited two-dimensional field of
view within which it will cast rays to interesting objects. Unblocked ray casts indicate visibility.
Figure 1
There are two important things to note. First, the operations of sensing are ordered from least
expensive to most expensive. Second, for player satisfaction, vision is a game of peek-a-boo. In
a first-person game, the player's sense of body is weak, and the player seen by an opponent
they do not see often feels cheated.
Most interesting is the snippet that restrains the AI's ability to see the player until seen by the
player, which is purely for coordinating the player's entertainment. This is an example of how
higher-level game goals can be simply and elegantly achieved by simple techniques in lower
level systems.
The logic for hearing is much simpler than vision. The basic element of a hearing component is
the definition and tuning of what it means for a sound to carry to the AI audibly. In the case of
Half-Life, hearing is a straightforward heuristic of the volume of the sound multiplied by a
"hearing sensitivity" yielding a distance within which the AI hears the sound. More interesting is
the demonstration of the utility of hearing as a catchall for general world information gathering.
In this example, the AI "hears" pseudo-sounds, fictional smells emanating from nearby corpses.
Thief: The Dark Project and its successors present a lightly scripted game world where the
central game mechanic, stealth, challenges the traditional form of the first-person 3D game.
The Thief player moves slowly, avoids conflict, is penalized for killing people, and is entirely
mortal. The gameplay centers on the ebb and flow AI sensory knowledge of the player as they
move through the game space. The player is expected to move through areas populated with
stationary, pacing, and patrolling AIs without being detected, creeping among shadows and
careful not to make alerting sounds. Though the game AI's senses are built on the same core
concepts as those of Half-Life, the mechanics of skulking, evading, and surprising require a
more sophisticated sensory system.
The primary requirement was creating a highly tunable sensory system that operated within a
wide spectrum of states. On the surface, stealth gameplay is about fictional themes of hiding,
evasion, surprise, quiet, light and dark. One of the things that makes that kind of experience
fun is broadening out the gray zone of safety and danger that in most first-person games is
razor thin. It's about getting the payer's heart pounding by holding them on the cusp of either
state, then letting loose once the zone is crossed. This demanded "broad-spectrum" senses that
didn't tend to polarize rapidly to the extremes of "player sensed" and "player not sensed."
A secondary requirement was that the sense system be active much more frequently and
operating on more objects than is typical of a first-person shooter. During the course of the
game, the player can alter the state of the world in ways that the AIs are supposed to take
notice of, even when the player is not around. These things, like body hiding, require reliable
sensing. Together with the first requirement, these created an interesting challenge when
weighed against the perennial requirement for game developers: performance.
Finally, it was necessary that both players and designers understand the inputs and outputs of
the sensory system, and that the outputs match learned expectations based on the inputs. This
suggested a solution with a limited number of player-perceivable inputs, and discrete valued
results.
At heart, the sensory system described here is very similar to that found in Half-Life. It is a
viewcone and raycast based vision system and simple hearing system with hooks to support
optimization, game mechanics, and pseudo-sensory data. Like the Half-Life example, most of
the sense gathering is decoupled from the decision process that acts on that information. This
system expands some of these core ideas, and introduces a few new ones.
The design of the system and the flow of data through it are derived from its definition as an
information gathering system that is customizable and tunable, but stable and intelligible in its
output.
Awareness is stored in sense links that associate either a given AI to another entity in the
game, or to a position in space. These relations store game relevant details of the sensing
(time, location, line-of-sight, etc.), as well as cached values used to reduce calculations from
think cycle to think cycle. Sense links are, in effect, the primary memory of the AI. Through
verbalization and observation sense links can be propagated among peer AIs, with controls in
place to constrain knowledge cascades across a level. They may also be manipulated by game
logic after base processing.
Each object of interest in the game has an intrinsic visibility value independent of any viewer.
Depending on the state of the game and the nature of the object the level of detail of this value
and the frequency of update are scaled in order to keep the amount of processor time spent
deriving the value within budgets.
Visibility is defined as the lighting, movement, and exposure (size, separation from other
objects) of the entity. The meaning of these is closely tied to the game requirements. For
example, the lighting of the player is biased towards the lighting near the floor below the
player, as this provides the player with an objective, perceivable way to anticipate their own
safety. These values and their aggregate sum visibility are stored as 0..1 analog values.
Viewcones
Rather than having a single two-dimensional field of view, the Thief senses implement a set of
ordered three-dimensional viewcones described as an XY angle, a Z angle, a length, a set of
parameters describing both general acuity and sensitivity to types of stimuli (e.g., motion
versus light), and relevance given the alertness of the AI. The viewcones are oriented according
to the direction an AI's head is facing.
At any time for a given object being sensed, only the first view cone the object is in is
considered in sense calculations. For simplicity and gameplay tunability, each viewcone is
presumed to produce a constant output regardless of where in the viewcone the subject is
positioned.
For example, the AI represented in Figure 4 has five viewcones. An object at point A will be
evaluated using viewcone number 3. The viewcone used for calculating the vision sense
awareness for an entity at either point B and point C is viewcone number 1, where identical
visibility values for an object will yield the same result.
Figure 4, Viewcones,
Top-view
When probing interesting objects in the world, the senses first determine which viewcone, if
any, applies to the subject. The intrinsic visibility is then passed through a "look" heuristic along
with the viewcone to output a discrete awareness value.
The motivation for multiple viewcones is to enable the expression of such things as direct
vision, peripheral vision, or a distinction between objects directly forward and on the same Z
plane as opposed to forward but above and below. Cone number 5 in the diagram above is a
good example of leveraging the low-level to express a high level concept. This "false vision"
cone is configured to look backwards and configured to be sensitive to motion, giving the AI a
"spidey-sense" of being followed too closely even if the player is silent.
Information Pipeline
The sense management system is designed as a series of components each taking a limited and
well-defined set of data and outputting an even more limited value. Each stage is intended to
be independently scalable in terms of the processing demands based on relevance to game
play. In terms of performance, these multiple scalable layers can be made to be extremely
efficient.
Figure 5, Information Pipeline
The core sensory system implements heuristics for accepting visibility, sound events, current
awareness links, designer and programmer configuration data, and current AI state, and
outputting a single awareness value for each object of interest. These heuristics are considered
a black box tuned by the AI programmer continually as the game develops.
Vision is implemented by filtering the visibility value of an object through the appropriate
viewcone, modifying the result based on the properties of the individual AI. In mundane cases a
simple raycast for line-of-sight is used. In more interesting cases, like the player, multiple
raycasts occur to include the spatial relation of the AI to the subject in the weighing of the
subject's exposure.
Thief has a sophisticated sound system wherein sounds both rendered and not rendered were
tagged with semantic data and propagated through the 3D geometry of the world. When a
sound "arrived" at an AI, it arrived from the directions it should in the real world, tagged with
attenuated awareness values, possibly carrying information from other AIs if it was a spoken
concept. These sounds join other awareness inducing things (like the Half-Life smell example)
as awareness relations to positions in space.
Awareness Pulses
Once the look and listen operations are complete, their awareness results are passed to a
method responsible for receiving periodic pulses from the raw senses, and resolving them into a
single awareness relationship, storing all the details in the associated sense link. Unlike the
analog data used in the pipeline to this point, the data in this process is entirely discrete. The
result of this process is to create, update, or expire sense links with the correct awareness
value.
This is a three-step process. First, the sound and vision input values are compared, one
declared dominant, and that becomes the value for awareness. The accessory data each
produces is then distilled together into a summary of the sense event.
Second, if the awareness pulse is an increase from previous readings, it is passed through a
time-based filter that controls whether the actual awareness will increase. The time delay is a
property only of the current state, not the goal state. This is how reaction delays and player
forgiveness factors are implemented. Once the time threshold is passed, the awareness
advances to the goal state without passing through intermediate states.
Finally, if the new pulse value is below current readings, a capacitor is used to allow awareness
to degrade gradually and smoothly. Awareness decreases across some amount of time, passing
through all the intermediate states. This softens the behavior of the AI once the object of
interest is no longer actively sensed, but is not the mechanism by which the core AI's alertness
is controlled.
If an object of interest is no longer generating pulses, the senses incorporate a degree of free
knowledge which is scaled based on the state of the AI. This mechanism produces the
appearance of deduction on the part of the AI when an object has left the field of view without
overtly demonstrating cheating to the player.
Conclusion
The system described here was designed for a single-player software rendered game. Because
of this, all authoritative information about game entities was available to it. Unfortunately, in a
game engine with a client/server architecture and a hardware-only renderer, this may not be
true. Determining the lit-ness field of an object's visibility may not be straightforward. Thus
incorporating such a system as described here is something to do deliberately and with care, as
it will place information demands on other systems.
Furthermore, although efficient in what it does, it is designed for a game that in many ways
centers around the system's output. In Thief it consumes a non-trivial amount of the AI's CPU
budget. This will take time away from pathing, tactical analysis, and other decision processes.
However, there are benefits to be had for any game to invest in their sensing code. By
gathering and filtering more information about the environment and serving it up in a well-
defined manner, senses can be leveraged to produce engaging AI behaviors without
significantly increasing the complexity of the decision state machines. A robust sense system
also provides a clean hook for expressing "pre-conscious" behaviors by controlling and
manipulating the core knowledge inputs. Finally, a multi-state sense system provides the player
with an AI opponent or ally that exhibits varied and subtle reactions and behaviors without
adding complexity to the core decision machines.
Further Exploration
Because of the highly data-driven nature of the Dark Engine on which Thief was built, most of
the concepts presented in this paper and all of the configuration details may be explored first-
hand using a copy of the tools available at http://www.thief-thecircle.com/
URL: http://www.gamasutra.com/features/20030211/lally_01.htm
At first, we were thrilled. As character animators, we couldn't have asked for a better project.
There were two heroes, dozens of enemies, scores of NPCs, and more than 100 character-
driven cutscenes. Enthusiasm and artistic latitude made it all ours for the taking.
But staying true to our shared vision of Ratchet & Clank meant that our digital actors needed to
become more than mere cycling automatons. We regarded each character as an intermediary
through which we could reach out to players and draw them deeper into our universe. This
meant our characters needed to blend physically into their environments, emotionally into their
situations, and expressively into our narrative. It was on these principles that we based both
our objectives and our standard of success.
Our team acknowledged that a rift existed between the level of complexity we desired and the
time we had scheduled to implement it. In order to surmount this obstacle, we developed
several methods for using Maya, our artistic skills, and our time more effectively.
This article will discuss these methods both in terms of their functionality and their
implementation. To this end, it will provide technical details on our testing practices, our MEL
shortcuts, and our real-time animation procedures.
Furthermore, it will explain how each of these methods saved us valuable production time,
enabling us to achieve our artistic goals.
For the most part, our prototypes had extremely simple skeletons:
all geometric components were assigned to a
Timing has a major effect on both the readability of an animation and on gameplay. From a
distance, a poorly timed idle can look muddy. An attack animation can be too slow to make an
enemy a worthy opponent, or too fast to be registered. Emphasis or a lack thereof on just a few
frames can make or break any animation, especially within the short cycles of the real-time
universe we were creating. We discovered that by testing and fine-tuning our timings in the
prototype stage, we could often avoid reworking polished animations on final characters.
Ultimately we found that our previsualization process was beneficial not just to animators but to
our design and programming staff as well. It gave our programmers a head start on coding
gameplay, while designers could test, tune, and ask for changes at a very early stage, allowing
room for refinements.
Prototyping saved animators time and energy that otherwise would have been spent
painstakingly modifying or redoing final multi-pass animations. It provided a relatively simple
means for evaluating character behaviors with respect to their timing, specifications, and
interactivity. Moreover, it provided our animators with a practice run, complete with feedback,
before moving on to a high-resolution character (Figure 2).
Two such scripts (examined later in this article) allowed our team
to take advantage of driven key functionality that otherwise would
have been too cumbersome to animate or too tedious to rig by
hand. Another tool enabled our artists, regardless of technical Figure 3. This leg
experience, to fit characters with IK systems automatically. setup was used for
most bipendal
Most of our bipedal characters had leg setups like the one pictured characters, saving
in Figure 3. As seen in the hierarchy (Figure 4) our legs had tedious hand-setups
standard hip, knee, and ankle joints, a heel joint, and two to three for IK systems for
bones in the feet. (For clarity purposes, please note that we individual characters.
referred to our foot bones as "toes.")
Our IK-rig consisted of three to four RP (Rotate Plane) IK-handles. These connected hip-to-
ankle, ankle-to-toe, toe-to-toe and/or toe-to-null. All were configured into a hierarchy (Figure
5) that specified relationships between the IK-handles, a set of locators, and several NURBS
constraint objects.
Figure 4. Standard Using the IK Setup Tool (Figure 6) was a three-step process. First,
hierarchy for a an artist checked their characters' leg joint names against the tool's
character's leg, as presets, making any necessary changes. Next, a scale factor for the
shown in the constraint objects was entered, based loosely on a character's size.
Hypergraph. The artist then hit nine buttons in sequence. These buttons would
auto-attach the IK handles and instantly build the constraint
hierarchy.
MEL is a quirky and often inconsistent language. A good portion of the time we spent
developing our IK Setup Tool was used to track down the proper commands for the tasks we
needed to execute. Still, we managed to uncover the MEL commands we needed to actuate the
core tasks of each of our nine tool buttons.
The first button's purpose was to place IK handles on a character's
legs. It read the names of the bones from the top
text fields by using the textFieldGrp command in its query (-q) mode.
These string variables were then passed to the ikHandle command,
which in turn created the IK handles.
Automating this process with MEL both saved us time and eliminated the steps most prone to
human error. Furthermore, by enabling any artist, regardless of their setup experience, to fit a
prototype and/or character with a functioning IK system quickly, we alleviated bottlenecks. This
conservation of both time and human resources saved energy that could then be devoted to
artwork.
The Walk Guide was an elongated cube with many smaller cuboids
attached to it. The smaller cuboids were identical to the polygonal
markers on our characters' ankles and toes, which were grouped to
their feet during setup.
There were several gameplay situations that were not as clean as the test case I just described;
however, the Walk Guide did serve to plant our character's feet properly in most of our worlds.
Once accustomed to the Guide, we animators found that using it benefited both our schedule
and our artwork, as it kept track of the more technical aspects of locomotion for us.
We knew from the start of developing Ratchet & Clank that facial expression would be an
important component not just to our cinematics but to our gameplay animations as well. Once
again, we were faced with the dueling goals of animation depth and scheduling efficiency. We
settled on two methods for making faces: one simple one for our enemies and one more
complex for our heroes. Expressions exaggerated the idles, intensified the attacks, and sealed
the deaths our of enemies and heroes alike.
When animating our enemies, we drew on a traditional animation dictum: A viewer of animation
is usually drawn to a character's face, particularly to the eyes. Attention paid to a character's
eyes and mouth was very important to making convincing actions, especially during our quick
gameplay cycles.
Most enemy characters had fairly simple face skeletons. However, these skeletons allowed for a
high degree of manipulation of the eyes and mouth. Each eye had between two and four bones
controlling its brow and lids. Mouths were generally simpler, using only one or two bones. In
most cases, this setup gave us all the flexibility we needed to exaggerate the enemy's features
and thus heighten the emotion of its actions (Figure 9).
Our heroes' faces had a more sophisticated setup, which they shared with the NPCs. Though
NPC faces were manipulated mostly in our cinematics, Ratchet & Clank made heavy use of
expression during gameplay, as well.
Like the enemy setups, hero and NPC faces were manipulated via their face joints. Unlike the
enemies', these joints were animated though a driven key system instead of being transformed
directly. Since they clocked more screen time, hero and NPC faces tended to have a far greater
amount of bones - and hence expressive range - than their enemy counterparts.
Figure 10. With enemy face skeletons, less
was more. Bone detail was reserved for
the eyes and mouth to enable simple,
exaggerated expressions. Here during an
in-game animation the Robot
Paratrooper's face reacts to being knocked
down.
Figure 10 shows some of the range of expression Ratchet and Clank exhibit during gameplay.
He smiles when excited, grimaces when he's hit, grits his teeth during combat, chatters them
when he's cold, and drops his jaw when he dies. Clank's expressions change both while he's
strapped to Ratchet's back and when he's played independently.
As I mentioned earlier, hero and NPC expressions were animated by combining preset driven
key attributes via a MEL script slider interface. These presets allowed the animator to combine
and create a wide array of facial expression without having to build them from scratch. Like
color primaries, these attributes could be blended together to form new combinations.
processes with another MEL script. Like our other MEL tools, this
script automated some of the tedious steps, allowing a setup artist to
spend more time on the art of sculpting facial poses.
The drivers for our facial animations were stored on a model called the Control Box, shown in
Figure 12. This hierarchy of cubes served as a visual outline of facial attributes, and could also
double as a second interface. For efficiency's sake, Ratchet, Clank and all of our NPC characters
had identical Control Boxes, though Ratchet's had many more active drivers.
End of Cycle
Like all character-driven projects, Ratchet & Clank presented our animation team with a unique
set of artistic and technical challenges. Our artistic philosophy was built on the understanding
that our characters were the instruments though which a player would experience our universe.
We knew that in meeting these challenges, our puppets would transcend mere game space and
become the entities that our players would identify with, vilify, and even personify.
However, this philosophy needed to be coupled with practical methodology if it was to see our
project to its conclusion. From this necessity grew our testing practices, MEL shortcuts, and
real-time animation procedures. Throughout production, these methods removed many of the
barriers that would otherwise have obstructed the artistic efforts of our animators.
As the Insomniac team cycles into our next project, we continue to refine and expand upon the
systems and procedures we developed during Ratchet & Clank. Though our procedures continue
to evolve, our underlying goals remain unchanged. For in the end, we can only prove a
technology's worth by an audience's response to our characters.
if ($txa != $txb)
{
By Fred Marcus
Gamasutra
January 21, 2003
URL: http://www.gamasutra.com/resource_guide/20030121/marcus_01
During my years as game design director at Angel Studios, there wasn’t a day where I didn’t
have to deal with physics in one or several of our products. Impressive physic demos have
always been one of Angel Studios ticket to get contracts, so we had to have some in our games.
And actually, we had a lot of physics in our games: from driving physics to ragdolls, to collisions
with stacking and more, the entire spectrum was covered. They all helped make our games look
and feel different from the rest of the crowd. That, and our ability to tame physics so our games
stayed playable and fun.
In this article, I will try to explain how to tune physics from a designer’s point of view with
some fundamentals you have to know and end with some classic traps to avoid. But first, do
you approach physics as a game designer?
A Hands-On Approach
Physics can make a game look and feel different. They can bring shocking realism to impacts,
to vehicle controls and really enhance the critical reactions a player’s action generate.
Physics can help pull a game into reality. The ultimate goal, however, is to enhance that reality.
If you get it right, you will give unique sensations to players and show them things they have
never experienced before. Unfortunately, there is a price to pay for that. Physics can take a lot
of CPU time and it requires a lot of tuning to keep a game fun to play.
Badly tuned physics can be a disaster: A vehicle can become impossible to control, objects can
get in the way of the player, things just don’t feel right and what should have made the game
fun just generates frustration. As a game designer, you are responsible for the fun of the game,
so it is your role to make sure physics are tuned properly and not left solely to a programmer’s
whim.
If you don't play a role designing and tuning the physics, your game might ship with you
unhappy with the tuning, wishing you had learned more sooner (also called regrets) and you
will miss a lot of opportunities to discover fun things you might have done with the physics. You
will have to get your hands dirty and tweak many variables many times before you get some
good results. There are no other ways around that. Besides, if you are a game designer, you
already have to tune a lot of variables for your controls and your cameras.
But before you get there though, you need to learn the theory behind physics.
Well, this time, it is your turn to listen and to listen well! It is essential for the designer to
understand the language of physics, to understand its limitations in real case scenarios, to
make sure you get a grasp of what is possible or not. Only then, after hours of questioning and
drawings (Paper and pencil are your best friends when sitting with a physics programmer) you
will start to grasp how you can control this beast and make it behave the way you want it to.
It is ok to ask questions, again and again. It’s time to take your game designer’s ego and throw
it out the door. If you don’t understand something, just ask and ask again, until you get it right
-- Everything has to be crystal clear. You might look dumb, that’s for sure, but you cannot
leave anything unexplained.
Understanding what the physics guy means is crucial if you plan to start to tweak things around
and understand what is happening in your game. It will also allow both of you to work on
solutions that solve these “special cases” occurrences. You know, it’s a game and sometimes
you don’t want physics to do this or that. If you don’t know what you are talking about, physics
wise, you will not be able to convince your programmer to tweak the code to make things work.
After all, he wants it to be real but you want it to be really FUN.
Before you start talking to your programmer (Yes, you have to!), here is a very simple to
understand primer on game physics as well as some basic techniques to help you tune your
game for fun.
Center of Gravity
The center of gravity is a crucial parameter in physics. It will determine how your object will
react to forces applied to it. The center of gravity is at the center of mass of an object (center
of mass is another name for center of gravity), where the object will likely be balanced. If you
move the position of the center of gravity of an object, you will change it’s behavior
dramatically: If a tall object has its center of gravity very low, it will be very stable and hard to
roll. If it is placed way up there though, the object will roll very easily. If you want a car to
make nice drifts, or roll a bit into turns, move your center of gravity around. You will get very
different driving feelings when you do so.
Physics in games are about forces and impulsions (a force is a continuous impulse). You can’t
just move a physically based object by giving it a position in 3D space. You have to push it or
apply a torque to rotate it. Also, the heavier the object, the stronger the force or the impulsion
you need to apply to move it (except if your object and the ones you are colliding with have a
low friction…). Some physics engine allow you to specify x,y and z coordinates and will try to
calculate the forces needed to bring your object there, but it will still be an approximation.
To control all these forces and to prevent things to get too crazy (like an object spinning all
over the place, a car that falls on it’s side too easily in turns) you have a friend that can help
you. And this friend is called dampening. This is your Tylenol, your savior -- the magic word. If
in your game physics make things fly out of control, go to your programmer and say the magic
word. He will know what you are talking about. You should also ask him about dampening when
things feel too slow, sluggish, smooth, controlled. Then, he probably already has dampening
coded in and he did the tuning himself. Get these tuning parameters exposed and play with
them.
You can dampen things in translation and in rotation, for each axis. In translation, damping can
make a box slow down progressively once pushed or it can slow down a car going too fast.
Think of it as brakes. It’s up to you if you want the brakes to be very soft and progressive or if
you want to bring objects to a brutal stop. Too much translation damping and the object won’t
move at all though.
In rotation, you can prevent an object to roll too much. Think of a car taking a turn too fast. If
you don’t want that car to roll over, you add damping on the z axis. If you put too much
damping though, the car won’t rotate at all and look very unrealistic.
So, damping is great, it helps you control things. The problem is, sometimes, you want an
object to behave differently at different velocities.
Maybe you want a vehicle to roll a lot at low speeds but if you keep it like that, it will definitely
roll over at high speeds and that’s a no no! Well, you can ask your programmer to give you
speed relative damping! Quadratics are great for that. You get a constant damping value C,
valid even if the object is static. On top of that, you have a damping value B for when the
object speeds up. And then, at really high speed, a third damping A value kicks in for these
special cases. The thing is, damping values can be updated every frame so it’s up to you to see
what your game needs and design a system that fits these needs.
Of course, too much damping and your game doesn’t need physics anymore! That’s where the
difficulty in balancing all this comes from. If your physics is too controlled by damping in order
to make the game playable, you can get rid of it when you need to! For example, during a car
crash, remove damping on all axis at once or progressively to see the vehicle spin all over the
place in the most spectacular way. The difficulty for the designer and the programmer is to
determine and recognize when a vehicle is really crashing.
Once you've learned a bit about how the physics in your game works, you'll soon encounter a
number of common problems that you previously lacked the tools to tackle. This section offers
a list of traps you have to avoid at all cost. They are classic problems, I see them every single
time physics are implemented in a game. Make sure you read this before you start to tune.
A key element in tuning physics is the possibility to tweak each value in real time while the
game plays. You have to ‘feel’ physics, just like you need to feel the controls and the cameras
in your game. Being able to tweak physics values in real time will allow you to increase your
iteration rate dramatically. Ask for this feature. Better: BEG for it!
Variable frame rate will change your physically based behaviors. Even if you are told that they
will stay the same once the code has been optimized, don’t trust believe it! It will not happen, it
will change and you will have to re-tune things (hopefully you will be trained by then and it
won’t take you that long).
Sloppy physics code can take a lot of CPU cycles. It can take so much time that your frame rate
will drop from 60 to 30 frames per seconds or worst. Variable frame rate disturbs a physic
simulation and usually, oversampling is used to solve the problem.
Oversampling physics means that the physics code is updated independently from the display.
If the game is displayed every other frame (30 fps), physics are still updated 60 times per
second. It means that the game runs through the physics code two times before an image is
displayed. It helps keep the physics stable.
But wait, if the game has to run the physics code two times before it displays something, then,
it eats even more CPU time. And the game can get even slower! And then you might drop to 20
fps and need to update physics three times before you display anything! This chain reaction is a
classic problem and trying to get a good, fast 60fps with no oversampling is the best you can
ask for. Beware of oversampling and low frame rates.
Physics LOD'd
Not everything needs to be physically based in your game. Especially when these things are far
from the player’s point of view. No need to compute suspensions if the vehicle is four pixels on
screen! Make sure your coder has physics LOD built in.
Another crucial thing to remember is that AI does not need to be physically based either. If you
want to have tight control over what the AI does in your game, then don’t make it physically
based. Physics will prevent programmers to move an object by specifying x, y and z
coordinates. They will need to move objects through impulsions and forces, a more ‘analogue’
and approximated way to do it.
Final Word
Not everyone is convinced that good physics are essential to a game -- "games don’t need it to
be fun” I hear every so often. It is true, absolutely.
But, each new platform brings us closer to realism graphically. Hardcore gamers complain when
the art is average, the AI dumb… and sooner or later they will reach a point where they will not
accept object behaviors that don’t look right. As for game designers, we have the opportunity to
make sure that game physics not only look and feel right, but that they play right -- balancing
realism with effective and fun tuning.
Contact Physics
By Roderick Kennedy
Gamasutra
January 21, 2003
URL: http://www.gamasutra.com/resource_guide/20030121/kennedy_01.shtml
One of my first jobs in game physics was writing the flight models for the fighter sim EF2000.
Back in the mid-90's, the physics challenges were well suited to the PC's of the time, and
contact physics wasn't part of the picture. A plane can be modelled very accurately as a point
mass in the sky, and the challenge for the physics programmer is to get the right lift
coefficients, drag, and engine model. It's hard to believe now, but combat flight sims were one
of the biggest PC genres in 1996. Microprose's F15 Strike Eagle kicked the whole thing off,
Spectrum Holobyte responded with the classic Falcon. Meanwhile, British upstarts DID
challenged the big boys with TFX, and then EF2000. And a shareware game called Doom was
slowing up development time, taking over office networks at lunch and dinnertime.
Today, flying and shooting is a niche market; todays games are much more "close-in", and it's
ground-based car simulations like World Rally Championship that occupy the hardcore sim niche
that flight sims once did. We now have the challenge of making games feel solid, creating an
illusion of tangible physical presence. With today's advanced graphics you really notice when
the physics are lagging behind. My colleagues at Evolution Studios (many of them DID alumni),
are looking to bring that all-important sense of solidity to new levels as they begin work on
their next WRC title.
In this article I will show how solid contact physics can be implemented, and describe some of
the problems the programmer will encounter. The article should be of help to physics
programmers, users of 3rd party engines, and decision makers who need to evaluate competing
technologies.
A Solid Contact
An example: A car has flipped over, and hits the ground, as in Figure 2. With a single contact
point and no friction, we do the math to calculate its motion. This simplest case has been
covered by other authors, so I'll be brief. The mass of the car is m, and it has a 3×3 moment of
inertia matrix J. We're looking for the force that the ground exerts on the car at the contact.
The vector equations for linear and angular motion are:
(1)
where x is the car's position, g is the gravitational acceleration; w is the car's angular velocity,
and q is the vector from x to the contact point. We've called the mystery contact force f, and N
is the surface normal, which is also the direction our force will act in. In Matrix form, this is our
"equation of motion":
(2)
I've put subscripts to describe the exact number of rows and columns. A 6 by 1 matrix and a 6-
vector are interchangeable. The "constraint equation" should complete the picture by specifying
that objects should not occupy the same space. We require that the car's contact point remains
exactly on the surface. Call this r1 in world space.
(3)
(4)
i.e. r2 is just the projection of r1 onto the surface. Our constraint is that r1 remains on the
surface:
(5)
and that works out as just . The 2nd order constraint is obtained by differentiating
twice:
(6)
(7)
(8)
The left-hand term is a "centripetal acceleration" - all points on a rotating solid accelerate
towards the centre. Note that the 1 by 6 matrix in the middle is the exact transpose of the one
in the equation of motion: this is true in general. I prefer to use a single vector for the
acceleration degrees of freedom, so:
(9)
(10)
Now, although I've used to describe the acceleration, we don't actually have a vector y,
because angular position can't be properly described with a 3-element vector. But as long as we
can obtain the change in w from one frame to the next, we can use quaternions or some other
method to describe angular position. Now invert the 6 by 6 mass matrix M, and substitute
equation. (9) into (10):
and :
(11)
This is our main equation. The solution is just f =- l/G. It's good to define l so it appears
negative in our expression, as l is the acceleration that would exist between the contact points
without the contact force. The force f acts in the opposite direction. If you calculate f and it
turns out to be negative, that means we're pulling, not pushing, and you should deactivate the
contact.
Now applying this acceleration over several timesteps will keep the car skidding along the
surface in the correct manner provided the initial velocity between the contact points was zero.
If it wasn't (e.g. when they first collided) we would need to apply an impulse to fix that.
Without going into details, the answer is: i = - v/G, where v is the velocity of r1 relative to r2,
normal to N and i is the impulse to be applied at the point of impact. You can apply this
correction every frame to prevent drift, alternatively add a heuristic term to l which is
proportional to v, so that the force will increase when the relative velocity is negative and
decrease when it's positive. Do the same for the position so the contact points line up nicely.
You can now handle a single contact between the car and the ground. Fine for many fast-
moving collisions, but eventually, the car will slow down and another corner will touch the
ground. Now we have to consider multiple-contact solutions.
Figure 3 shows this situation, where a whole edge of the car is touching the ground. In reality
the force will be spread across the whole contacting edge. For our purposes, we can just
consider two forces at the two endpoints.
Our two contact points have normals Na and Nb (these might be just the same vertical vector
but let's keep it general). The forces are fa and fb. The equation of motion is
(12)
and as you might guess from Equation (8), the constraint equation is:
(13)
(14)
(15)
- and these two equations are like Equationns (4) and (5), except now f is a two-element
vector, as is a. Re-arranging,
(16)
Equation (16) is like (11), except of course, that Gamma is now a 2x2 matrix. Here's where
contact physics is different from the old-style video game collision. You can't apply the contact
corrections sequentially. You have to find the one solution for fa and fb which satisfies both
constraints. Now in this case, the matrix is only 2x2, and it's guaranteed to have an inverse
(unless contacts a and b are in the same place).
So:
(17)
Now by putting in two constraints, we've turned our car from a six degree-of-freedom system,
to four. Add another contact, and (usually) you'll take away another degree of freedom. With
three contacts, we've effectively got a whole surface of the car touching the ground, and if the
contact normals are vertical (i.e. we're on flat ground) none of the remaining degrees of
freedom is affected by gravity. So with a bit of friction, our car can come to a halt.
And G would be a 3×3 matrix. Once again, as long as the three contacts are not at the same
place, and don't fall in a line, you're guaranteed to have an invertible matrix G.
If you look in the literature, you'll find that this method isn't used much. That's because matrix
inversion and other standard linear algebra techniques can't guarantee that all the forces will be
positive. Suppose we require in advance that:
This means all contact forces are positive or zero, and the accelerations they produce are
positive or zero (no impinging between solids). Now we have what's called a "linear
complementarity problem" (LCP), and an iterative method can give a solution where all forces
are either positive, or zero. The common solution method is Lemke's algorithm, which you will
find via [1]. A very good introduction to this approach is found in [2].
Redundancy
The problem is, our car may not have any triangular surfaces! When the next collision occurs,
we'll most likely have four contacts. Not only that, but if the contact normals are all the same,
the G matrix will now be singular - it has no unique inverse, and most matrix-inversion routines
will fail. Why? Because four contacts between two surfaces is more than we need to zero out
the relative motion at the contact points. As shown in Figure 4, there's an infinite number of
combinations of forces at the contacts which will produce the same effect.
…etc.
This would work as long as the three contacts we have are spaced well. But if they're all on one
side of the object, it might start to tip over. Then one contact would vanish, and a new one
would appear on the other side. We could end up with an ugly oscillation.
We could detect when more than two points on a surface are touching another surface, then
switch to a single "surface contact" which constrains three d.o.f. However, it would be awkward
to then calculate when to deactivate this special contact, and it would mean introducing a
different set of equations. This method would also not help when redundancy arose from
contacts which are not on the same surface.
It stands to reason that there should be a relationship between the contact points which would
allow us to find all the contact forces.
Redundancy is a different kind of issue if you're iterating to find a solution, because in most
cases, you don't care which of the infinitely many solutions you find - the behaviour will be the
same, provided you've already made sure of not finding negative forces.
It's quite common in contact physics applications to see objects which should move smoothly or
settle down quickly, instead shake, wobble or jump into motion. There are two common causes
for this behaviour.
1. The solution is wrong. Or not quite correct. This is common when the iterative scheme for
finding the forces either can't reach a solution, or stops too soon. The short answer is to do
more iterations - but there will be a speed issue.
2. The properties of the physical system can't be well-modelled at the framerate. One
way this happens is if you've put some unrealistic values in your mass matrix (having moments
of inertia too small for the object's size and mass is a frequent mistake). Friction can do this as
well, if your friction coefficients are large enough that they more-than reverse the object's
motion between one frame and the next. Physically realistic friction values will do this in many
simulations.
Sometimes, programmers will solve No. 2 by having multiple physics iterations for each game
frame. Please don't do this, there's almost always a better way.
Linked Objects
Figure 5: Modelling ragdolls needs joined-
up thinking.
We've so far covered contacts between single, 6-d.o.f. bodies and a static world. But a lot of
games need linked systems of bodies, for instance using a "ragdoll" model for realistic death
animations (see Figure 5). There are two approaches you can use here:
1. Treat each part of the hierarchy as a separate body. Then define special contacts
between the bodies at the joints. We would have a contact for, say, the hip-joint, which can
have positive or negative forces, limits rotation, and acts in all directions. Using this method,
you can use the techniques outlined above, but you will probably need a good iterative scheme,
as you will be solving for, well, a lot of forces.
2. Treat the whole hierarchy as a single object. The object will have, not 6, but maybe 26
degrees of freedom. Calculating the mass matrix will be a daunting task, but with this method
you won't have to worry about limbs stretching and detaching when too-big an impulse is
applied.
Friction
Friction should be applied between nearly all contacts. Generally, if there is a sideways velocity
between the contacting points r1 and r2, dynamic friction is:
(17)
where mdynamic is the coefficient of dynamic friction, Fnormal is the contact force Nf and is the
unit vector in the "scraping direction". Note how even if the relative velocity is small we can still
get a big force, and that can lead to the stability problem I mentioned earlier.
Static friction is different; it acts to prevent relative motion completely, and can do so provided
the necessary friction force is smaller than mstatic|Fnormal|. To model static friction, treat it as
two extra forces per contact to be solved for in f. It can also be considered as another
complementarity condition in the solution method - either static friction is smaller than the
maximum, or it is zero.
Generally, mstatic is larger than mdynamic - for example, a car's tyres on tarmac might have a
static friction coefficient of 1.5, and a dynamic friction of 1.2. So you'll get better turning force
if you stay in the static friction zone - or better braking if you don't lock the wheels. You can
see this effect in action in WRC - on tarmac you can keep the vehicle just on the edge of the
static limit, but on gravel or snow you'll be in the dynamic zone most of the time.
Many simulations either ignore static friction, or simulate it by having a larger coefficient when
relative velocity is low. This can lead, along with inaccurate coefficients or bad inertia values, to
a "floaty" behaviour - objects seem to slide too much, as though in slow-motion. Avoid this by
correctly modelling static friction whenever possible, and by ensuring your dynamic friction
coefficients are close to reality as you can get without causing instability.
Types of Contact
I've so far discussed only one geometric type of contact - of a point and a surface. Many games
only use this type, but to fully model your objects as polyhedra, you'll need edge-to-edge
contacts as well. Then the contact direction N is determined by the cross product of the edge
directions. Most of the derivations are a little more involved, but the same principles apply as
above. You can also implement curved surfaces. For each contact type (sphere-surface, point-
cylinder, I could go on) you'll need an expression for the line of the b matrix (or three lines if
you're including static friction). Rolling contacts are particularly tricky for all but the simplest
types.
Final Words
A good place to start could be an impulse-based system that only ever has one contact at a
time. Once you're happy with that, try multiple contacts. With only one or two, you will be able
to get away with using the matrix inverse. When you get to having fairly complex systems like
the ragdoll, it's worth trying an iteration scheme.
For anyone serious about game physics, the place to go next is David Baraff's page [1], where
you can download some of the major papers on the subject. Chris Hecker [3] offers a more
game-centric summary and a good overview of the field.
You should now be well on your way to some rock-solid contact physics, though it's a perilous
road. Some programmers have had good results with approximate methods, like the Verlet
particle systems described in [4]. Alternatively, there are several middleware packages which
effectively provide a plug-in solution for dynamics. Unless you find the cost prohibitive or really
want to do something new in physics, these are well worth a look. Game graphics are fairly
racing ahead, and if the physics we use can keep pace, creating new and captivating
experiences for gamers should be well within our grasp.
References
[2] David Baraff, "Analytical Methods for Dynamic Simulation of Non-penetrating Rigid Bodies",
Computer Graphics, Volume 23, Number 3, July 1989.
[4] Thomas Jakobsen, "Advanced Character Physics", Gamasutra Game Physics Resource Guide
By Thomas Jakobsen
Gamasutra
January 21, 2003
URL: http://www.gamasutra.com/resource_guide/20030121/jacobson_01.shtml
This article explains the basic elements of an approach to physically-based modeling which is
well suited for interactive use. It is simple, fast, and quite stable, and in its basic version the
method does not require knowledge of advanced mathematical subjects (although it is based on
a solid mathematical foundation). It allows for simulation of both cloth; soft and rigid bodies;
and even articulated or constrained bodies using both forward and inverse kinematics.
The algorithms were developed for IO Interactive’s game Hitman: Codename 47. There, among
other things, the physics system was responsible for the movement of cloth, plants, rigid
bodies, and for making dead human bodies fall in unique ways depending on where they were
hit, fully interacting with the environment (resulting in the press oxymoron “lifelike death
animations”).The article also deals with subtleties like penetration test optimization and friction
handling.
The use of physically-based modeling to produce nice-looking animation has been considered
for some time and many of the existing techniques are fairly sophisticated. Different
approaches have been proposed in the literature [Baraff, Mirtich, Witkin, and others] and much
effort has been put into the construction of algorithms that are accurate and reliable. Actually,
precise simulation methods for physics and dynamics have been known for quite some time
from engineering. However, for games and interactive use, accuracy is really not the primary
concern (although it’s certainly nice to have) – rather, here the important goals are believability
(the programmer can cheat as much as he wants if the player still feels immersed) and speed
of execution (only a certain time per frame will be allocated to the physics engine). In the case
of physics simulation, the word believability also covers stability; a method is no good if objects
seem to drift through obstacles or vibrate when they should be lying still, or if cloth particles
tend to “blow up”.
The methods demonstrated in this paper were created in an attempt to reach these goals. The
algorithms were developed and implemented by the author for use in IO Interactive’s computer
game Hitman: Codename 47, and have all been integrated in IO’s in-house game engine
Glacier. The methods proved to be quite simple to implement (compared to other schemes at
least) and have high performance.
The algorithm is iterative such that, from a certain point, it can be stopped at any time. This
gives us a very useful time/accuracy trade-off: If a small source of inaccuracy is accepted, the
code can be allowed to run faster; this error margin can even be adjusted adaptively at run-
time. In some cases, the method is as much as an order of magnitude faster than other existing
methods. It also handles both collision and resting contact in the same framework and nicely
copes with stacked boxes and other situations that stress a physics engine.
In overview, the success of the method comes from the right combination of several techniques
that all benefit from each other:
Each of the above subjects will be explained shortly. In writing this document, the author has
tried to make it accessible to the widest possible audience without losing vital information
necessary for implementation. This means that technical mathematical explanations and notions
are kept to a minimum if not crucial to understanding the subject. The goal is demonstrating
the possibility of implementing quite advanced and stable physics simulations without dealing
with loads of mathematical intricacies.
In the following, bold typeface indicates vectors. Vector components are indexed by using
subscript, i.e., x=(x1, x2, x3).
Verlet integration
where ∆t is the time step, and a is the acceleration computed using Newton’s law f=ma (where
f is the accumulated force acting on the particle). This is simple Euler integration.
This is called Verlet integration (see [Verlet]) and is used intensely when simulating molecular
dynamics. It is quite stable since the velocity is implicitly given and consequently it is harder for
velocity and position to come out of sync. (As a side note, the well-known demo effect for
creating ripples in water uses a similar approach.) It works due to the fact that 2x-x*=x+(x-
x*) and x-x* is an approximation of the current velocity (actually, it’s the distance traveled last
time step). It is not always very accurate (energy might leave the system, i.e., dissipate) but
it’s fast and stable. By lowering the value 2 to something like 1.99 a small amount of drag can
also be introduced to the system.
At the end of each step, for each particle the current position x gets stored in the corresponding
variable x*. Note that when manipulating many particles, a useful optimization is possible by
simply swapping array pointers.
The resulting code would look something like this (the Vector3 class should contain the
appropriate member functions and overloaded operators for manipulation of vectors):
The above code has been written for clarity, not speed. One optimization would be using arrays
of float instead of Vector3 for the state representation. This might also make it easier to
implement the system on a vector processor.
This probably doesn’t sound very groundbreaking yet. However, the advantages should become
clear soon when we begin to use constraints and switch to rigid bodies. It will then be
demonstrated how the above integration scheme leads to increased stability and a decreased
amount of computation when compared to other approaches.
Try setting a=(0,0,1), for example, and use the start condition x=(1,0,0), x*=(0,0,0), then do
a couple of iterations by hand and see what happens.
So-called penalty-based schemes handle contact by inserting springs at the penetration points.
While this is very simple to implement, it has a number of serious drawbacks. For instance, it is
hard to choose suitable spring constants such that, on one hand, objects don’t penetrate too
much and, on the other hand, the resulting system doesn’t get unstable. In other schemes for
simulating physics, collisions are handled by rewinding time (by binary search for instance) to
the exact point of collision, handling the collision analytically from there and then restarting the
simulation – this is not very practical from a real-time point of view since the code could
potentially run very slowly when there are a lot of collisions.
Here, we use yet another strategy. Offending points are simply projected out of the obstacle.
By projection, loosely speaking, we mean moving the point as little as possible until it is free of
the obstacle. Normally, this means moving the point perpendicularly out towards the collision
surface.
Let’s examine an example. Assume that our world is the inside of the cube (0,0,0)-
(1000,1000,1000) and assume also that the particles’ restitution coefficient is zero (that is,
particles do not bounce off surfaces when colliding). To keep all positions inside the valid
interval, the corresponding projection code would be:
(vmax operates on vectors taking the component-wise maximum whereas vmin takes the
component-wise minimum.) This keeps all particle positions inside the cube and handles both
collisions and resting contact. The beauty of the Verlet integration scheme is that the
corresponding changes in velocity will be handled automatically. In the following calls to
TimeStep(), the velocity is automatically regulated to contain no component in the normal
direction of the surface (corresponding to a restitution coefficient of zero). See Figure 1.
Try it out – there is no need to directly cancel the velocity in the normal direction. While the
above might seem somewhat trivial when looking at particles, the strength of the Verlet
integration scheme is now beginning to shine through and should really become apparent when
introducing constraints and coupled rigid bodies in a moment.
Solving several concurrent constraints by relaxation
A common model for cloth consists of a simple system of interconnected springs and particles.
However, it is not always trivial to solve the corresponding system of differential equations. It
suffers from some of the same problems as the penalty-based systems: Strong springs leads to
stiff systems of equations that lead to instability if only simple integration techniques are used,
or at least bad performance – which leads to pain. Conversely, weak springs lead to elastically
looking cloth.
However, an interesting thing happens if we let the stiffness of the springs go to infinity: The
system suddenly becomes solvable in a stable way with a very simple and fast approach. But
before we continue talking about cloth, let’s revisit the previous example. The cube considered
above can be thought of as a collection of unilateral (inequality) constraints (one for each side
of the cube) on the particle positions that should be satisfied at all times:
(C1)
In the example, constraints were satisfied (that is, particles are kept inside the cube) by simply
modifying offending positions by projecting the particles onto the cube surface. To satisfy (C1),
we use the following pseudo-code
One may think of this process as inserting infinitely stiff springs between the particle and the
penetration surface – springs that are exactly so strong and suitably damped that instantly they
will attain their rest length zero.
We now extend the experiment to model a stick of length 100. We do this by setting up two
individual particles (with positions x1 and x2) and then require them to be a distance of 100
apart. Expressed mathematically, we get the following bilateral (equality) constraint:
Although the particles might be correctly placed initially, after one integration step the
separation distance between them might have become invalid. In order to obtain the correct
distance once again, we move the particles by projecting them onto the set of solutions
described by (C2). This is done by pushing the particles directly away from each other or by
pulling them closer together (depending on whether the erroneous distance is too small or too
large). See Figure 2.
Note that delta is a vector so delta*delta is actually a dot product. With restlength=100 the
above pseudo-code will push apart or pull together the particles such that they once more
attain the correct distance of 100 between them. Again we may think of the situation as if a
very stiff spring with rest length 100 has been inserted between the particles such that they are
instantly placed correctly.
Now assume that we still want the particles to satisfy the cube constraints. By satisfying the
stick constraint, however, we may have invalidated one or more of the cube constraints by
pushing a particle out of the cube. This situation can be remedied by immediately projecting the
offending particle position back onto the cube surface once more – but then we end up
invalidating the stick constraint again.
Really, what we should do is solve for all constraints at once, both (C1) and (C2). This would be
a matter of solving a system of equations. However, we choose to proceed indirectly by local
iteration. We simply repeat the two pieces of pseudo-code a number of times after each other
in the hope that the result is useful. This yields the following code:
(Initialization of the two particles has been omitted.) While this approach of pure repetition
might appear somewhat naïve, it turns out that it actually converges to the solution that we are
looking for! The method is called relaxation (or Jacobi or Gauss-Seidel iteration depending on
how you do it exactly, see [Press]). It works by consecutively satisfying various local
constraints and then repeating; if the conditions are right, this will converge to a global
configuration that satisfies all constraints at the same time. It is useful in many other situations
where several interdependent constraints have to be satisfied at the same time.
The number of necessary iterations varies depending on the physical system simulated and the
amount of motion. It can be made adaptive by measuring the change from last iteration. If we
stop the iterations early, the result might not end up being quite valid but because of the Verlet
scheme, in next frame it will probably be better, next frame even more so etc. This means that
stopping early will not ruin everything although the resulting animation might appear somewhat
sloppier.
Cloth Simulation
The fact that a stick constraint can be thought of as a really hard spring should make apparent
its usefulness for cloth simulation as sketched in the beginning of this section. Assume, for
example, that a hexagonal mesh of triangles describing the cloth has been constructed. For
each vertex a particle is initialized and for each edge a stick constraint between the two
corresponding particles is initialized (with the constraint’s “rest length” simply being the initial
distance between the two vertices).
The function HandleConstraints() then uses relaxation over all constraints. The relaxation
loop could be iterated several times. However, to obtain nicely looking animation, actually for
most pieces of cloth only one iteration is necessary! This means that the time usage in the cloth
simulation depends mostly on the N square root operations and the N divisions performed
(where N denotes the number of edges in the cloth mesh). As we shall see, a clever trick makes
it possible to reduce this to N divisions per frame update – this is really fast and one might
argue that it probably can’t get much faster.
We now discuss how to get rid of the square root operation. If the constraints are all satisfied
(which they should be at least almost), we already know what the result of the square root
operation in a particular constraint expression ought to be, namely the rest length r of the
corresponding stick. We can use this fact to approximate the square root function.
Mathematically, what we do is approximate the square root function by its 1st order Taylor-
expansion at a neighborhood of the rest length r (this is equivalent to one Newton-Raphson
iteration with initial guess r). After some rewriting, we obtain the following pseudo-code:
Notice that if the distance is already correct (that is, if |delta|=restlength), then one gets
delta=(0,0,0) and no change is going to happen.
Per constraint we now use zero square roots, one division only, and the squared value
restlength*restlength can even be precalculated! The usage of time consuming operations is
now down to N divisions per frame (and the corresponding memory accesses) – it can’t be done
much faster than that and the result even looks quite nice. Actually, in Hitman, the overall
speed of the cloth simulation was limited mostly by how many triangles it was possible to push
through the rendering system.
The constraints are not guaranteed to be satisfied after one iteration only, but because of the
Verlet integration scheme, the system will quickly converge to the correct state over some
frames. In fact, using only one iteration and approximating the square root removes the
stiffness that appears otherwise when the sticks are perfectly stiff.
By placing support sticks between strategically chosen couples of vertices sharing a neighbor,
the cloth algorithm can be extended to simulate plants. Again, in Hitman only one pass through
the relaxation loop was enough (in fact, the low number gave the plants exactly the right
amount of bending behavior).
The code and the equations covered in this section assume that all particles have identical
mass. Of course, it is possible to model particles with different masses, the equations only get a
little more complex.
To satisfy (C2) while respecting particle masses, use the following code:
Here invmass1 and invmass2 are the numerical inverses of the two masses. If we want a
particle to be immovable, simply set invmass=0 for that particle (corresponding to an infinite
mass). Of course in the above case, the square root can also be approximated for a speed-up.
Rigid Bodies
The equations governing motion of rigid bodies were discovered long before the invention of
modern computers. To be able to say anything useful at that time, mathematicians needed the
ability to manipulate expressions symbolically. In the theory of rigid bodies, this lead to useful
notions and tools such as inertia tensors, angular momentum, torque, quaternions for
representing orientations etc. However, with the current ability to process huge amounts of
data numerically, it has become feasible and in some cases even advantageous to break down
calculations to simpler elements when running a simulation. In the case of 3D rigid bodies, this
could mean modeling a rigid body by four particles and six constraints (giving the correct
amount of degrees of freedom, 4x3-6 = 6). This simplifies a lot of aspects and it’s exactly what
we will do in the following.
Consider a tetrahedron and place a particle at each of the four vertices. In addition, for each of
the six edges on the tetrahedron create a distance constraint like the stick constraint discussed
in the previous section. This is actually enough to simulate a rigid body. The tetrahedron can be
let loose inside the cube world from earlier and the Verlet integrator will let it move correctly.
The function SatisfyConstraints() should take care of two things: 1) That particles are kept
inside the cube (like previously), and 2) That the six distance constraints are satisfied. Again,
this can be done using the relaxation approach; 3 or 4 iterations should be enough with optional
square root approximation.
Now clearly, in general rigid bodies do not behave like tetrahedrons collision-wise (although
they might do so kinetically). There is also another problem: Presently, collision detection
between the rigid body and the world exterior is on a vertex-only basis, that is, if a vertex is
found to be outside the world it is projected inside again. This works fine as long as the inside
of the world is convex. If the world were non-convex then the tetrahedron and the world
exterior could actually penetrate each other without any of the tetrahedron vertices being in an
illegal region (see Figure 3 where the triangle represents the 2D analogue of the tetrahedron).
This problem is handled in the following.
Figure 3: A tetrahedron
pentrating the world.
We’ll first consider a simpler version of the problem. Consider the stick example from earlier
and assume that the world exterior has a small bump on it. The stick can now penetrate the
world exterior without any of the two stick particles leaving the world (see Figure 4). We won’t
go into the intricacies of constructing a collision detection engine since this is a science in itself.
Instead we assume that there is a subsystem available which allows us to detect the collision.
Furthermore we assume that the subsystem can reveal to us the penetration depth and identify
the penetration points on each of the two colliding objects. (One definition of penetration points
and penetration depth goes like this: The penetration distance dp is the shortest distance that
would prevent the two objects from penetrating if one were to translate one of the objects by
the distance dp in a suitable direction. The penetration points are the points on each object that
just exactly touch the other object after the aforementioned translation has taken place.)
Take a look again at Figure 4. Here the stick has moved through the bump after the Verlet step.
The collision engine has identified the two points of penetration, p and q. In Figure 4a, p is
actually identical to the position of particle 1, i.e., p=x1. In Figure 4b, p lies between x1 and
x2 at a position ¼ of the stick length from x1. In both cases, the point p lies on the stick and
consequently it can be expressed as a linear combination of x1 and x2, p=c1 x1+c2 x2 such
that c1+c2=1. In the first case, c1=1 and c2=0, in the second case, c1=0.75 and c2=0.25.
These values tell us how much we should move the corresponding particles.
To fix the invalid configuration of the stick, it should be moved upwards somehow. Our goal is
to avoid penetration by moving p to the same position as q. We do this by adjusting the
positions of the two particles x1 and x2 in the direction of the vector between p and q, ∆=q-p.
In the first case, we simply project x1 out of the invalid region like earlier (in the direction of q)
and that’s it (x2 is not touched). In the second case, p is still nearest to x1 and one might
reason that consequently x1 should be moved more than x2. Actually, since p=0.75 x1 + 0.25
x2, we will choose to move x1 by an amount of 0.75 each time we move x2 by an amount of
0.25. In other words, the new particle positions x1’ and x2’ are given by the expressions:
(*)
where λ is some unknown value. The new position of p after moving both particles is p’=c1
x1’+ c2 x2’.
Recall that we want p’=q, i.e., we should choose l exactly such that p’ ends up coinciding with
q. Since we move the particles only in the direction of ∆, also p moves in the direction of ∆ and
consequently the solution to the equation p’=q can be found by solving:
(**)
Plugging λ into (*) gives us the new positions of the particles for which p’ coincide with q.
Figure 5 shows the situation after moving the particles. We have no object penetration but now
the stick length constraint has been violated. To fix this, we do yet another iteration of the
relaxation loop (or several) and we’re finished.
The above strategy also works for the tetrahedron in a completely analogous fashion. First the
penetration points p and q are found (they may also be points interior to a triangle), and p is
expressed as a linear combination of the four particles p=c1 x1+c2 x2+c3 x3+c4 x4 such that
c1+c2+c3+c4=1 (this calls for solving a small system of linear equations). After finding ∆=q-p,
one computes the value:
and the new positions are then given by:
Here, we have collided a single rigid body with an immovable world. The above method
generalizes to handle collisions of several rigid bodies. The collisions are processed for one pair
of bodies at a time. Instead of moving only p, in this case both p and q are moved towards
each other.
Again, after adjusting the particle positions such that they satisfy the non-penetration
constraints, the six distance constraints that make up the rigid body should be taken care of
and so on. With this method, the tetrahedron can even be imbedded inside another object that
can be used instead of the tetrahedron itself to handle collisions. In Figure 6, the tetrahedron is
embedded inside a cube.
First, the cube needs to be ‘fastened’ to the tetrahedron in some way. One approach would be
choosing the system mass midpoint 0.25*(x1+x2+x3+x4) as the cube’s position and then
derive an orientation matrix by examining the current positions of the particles. When a
collision/penetration is found, the collision point p (which in this case will be placed on the
cube) is then treated exactly as above and the positions of the particles are updated
accordingly. As an optimization, it is possible to precompute the values of c1-c4 for all vertices
of the cube. If the penetration point p is a vertex, the values for c1-c4 can be looked up and
used directly. Otherwise, p lies on the interior of a surface triangle or one of its edges and the
values of c1-c4 can then be interpolated from the precomputed values of the corresponding
triangle vertices.
Embedding the
tetrahedron inside
another object.
Usually, 3 to 4 relaxation iterations are enough. The bodies will not behave as if they were
completely rigid since the relaxation iterations are stopped prematurely. This is mostly a nice
feature, actually, as there is no such thing as perfectly rigid bodies – especially not human
bodies. It also makes the system more stable.
By rearranging the positions of the particles that make up the tetrahedron, the physical
properties can be changed accordingly (mathematically, the inertia tensor changes as the
positions and masses of the particles are changed).
Other arrangements of particles and constraints than a tetrahedron are possible such as placing
the particles in the pattern of a coordinate system basis, i.e. at (0,0,0), (1,0,0), (0,1,0),
(0,0,1). Let a, b, and c be the vectors from particle 1 to particles 2, 3, and 4, respectively.
Constrain the particles’ positions by requiring vectors a, b, and c to have length 1 and the
angle between each of the three pairs of vectors to be 90 degrees (the corresponding dot
products should be zero). (Notice, that this again gives four particles and six constraints.)
Articulated Bodies
It is possible to connect multiple rigid bodies by hinges, pin joints, and so on. Simply let two
rigid bodies share a particle, and they will be connected by a pin joint. Share two particles, and
they are connected by a hinge. See Figure 7.
It is also possible to connect two rigid bodies by a stick constraint or any other kind of
constraint – to do this, one simply adds the corresponding ‘fix-up’ code to the relaxation loop.
This approach makes it possible to construct a complete model of an articulated human body.
For additional realism, various angular constraints will have to be implemented as well. There
are different ways to accomplish this. A simple way is using stick constraints that are only
enforced if the distance between two particles falls below some threshold (mathematically, we
have a unilateral (inequality) distance constraint, |x2-x1|>100). As a direct result, the two
particles will never come too close to each other. See Figure 8.
Particles can also be restricted to move, for example, in certain planes only. Once again,
particles with positions not satisfying the above-mentioned constraints should be moved –
deciding exactly how is slightly more complicated that with the stick constraints.
Actually, in Hitman corpses aren’t composed of rigid bodies modeled by tetrahedrons. They are
simpler yet, as they consist of particles connected by stick constraints in effect forming stick
figures. See Figure 9. The position and orientation for each limb (a vector and a matrix) are
then derived for rendering purposes from the particle positions using various cross products and
vector normalizations (making certain that knees and elbows bend naturally).
Figure 9: The particle/stick
configuration used in Hitman to
represetn human anatomy.
In other words, seen isolated each limb is not a rigid body with the usual 6 degrees of freedom.
This means that physically the rotation around the length axis of a limb is not simulated.
Instead, the skeletal animation system used to setup the polygonal mesh of the character is
forced to orientate the leg, for instance, such that the knee appears to bend naturally. Since
rotation of legs and arms around the length axis does not comprise the essential motion of a
falling human body, this works out okay and actually optimizes speed by a great deal.
Angular constraints are implemented to enforce limitations of the human anatomy. Simple self
collision is taken care of by strategically introducing inequality distance constraints as discussed
above, for example between the two knees – making sure that the legs never cross.
For collision with the environment, which consists of triangles, each stick is modeled as a
capped cylinder. Somewhere in the collision system, a subroutine handles collisions between
capped cylinders and triangles. When a collision is found, the penetration depth and points are
extracted, and the collision is then handled for the offending stick in question exactly as
described in the beginning of Section 5.
Naturally, a lot of additional tweaking was necessary to get the result just right.
Comments
This section contains various remarks that didn’t fit anywhere else.
Motion control
To influence the motion of a simulated object, one simply moves the particles correspondingly.
If a person is hit at the shoulder, move the shoulder particle backwards over a distance
proportional to the strength of the blow. The Verlet integrator will then automatically set the
shoulder in motion.
This also makes it easy for the simulation to ‘inherit’ velocities from an underlying traditional
animation system. Simply record the positions of the particles for two frames and then give
them to the Verlet integrator, which then automatically continues the motion. Bombs can be
implemented by pushing each particle in the system away from the explosion over a distance
inversely proportional to the square distance between the particle and the bomb center.
It is possible to constrain a specific limb, say the hand, to a fixed position in space. In this way,
one can implement inverse kinematics (IK): Inside the relaxation loop, keep setting the position
of a specific particle (or several particles) to the position(s) wanted. Giving the particle infinite
mass (invmass=0) helps making it immovable to the physics system. In Hitman, this strategy is
used when dragging corpses; the hand (or neck or foot) of the corpse is constrained to follow
the hand of the player.
Handling friction
Friction has not been taken care of yet. This means that unless we do something more,
particles will slide along the floor as if it were made of ice. According to the Coulomb friction
model, friction force depends on the size of the normal force between the objects in contact. To
implement this, we measure the penetration depth dp when a penetration has occurred (before
projecting the penetration point out of the obstacle). After projecting the particle onto the
surface, the tangential velocity vt is then reduced by an amount proportional to dp (the
proportion factor being the friction constant). This is done by appropriately modifying x*. See
the Figure 10. Care should be taken that the tangential velocity does not reverse its direction –
in this case one should simply be set it to zero since this indicates that the penetration point
has seized to move tangentially. Other and better friction models than this could and should be
implemented.
Collision detection
One of the bottlenecks in physics simulation as presented here lies in the collision detection,
which is potentially performed several times inside the relaxation loop. It is possible, however,
to iterate a different number of times over the various constraints and still obtain good results.
In Hitman, the collision system works by culling all triangles inside the bounding box of the
object simulated (this is done using a octtree approach). For each (static, background) triangle,
a structure for fast collision queries against capped cylinders is then constructed and cached.
This strategy gave quite a speed boost.
To prevent objects that are moving really fast from passing through other obstacles (because of
too large time steps), a simple test if performed. Imagine the line (or a capped cylinder of
proper radius) beginning at the position of the object’s midpoint last frame and ending at the
position of the object’s midpoint at the current frame. If this line hits anything, then the object
position is set to the point of collision. Though this can theoretically give problems, in practice it
works fine.
Another collision ‘cheat’ is used for dead bodies. If the unusual thing happens that a fast
moving limb ends up being placed with the ends of the capped cylinder on each side of a wall,
the cylinder is projected to the side of the wall where the cylinder is connected to the torso.
Miscellaneous
The number of relaxation iterations used in Hitman vary between 1 and 10 with the kind of
object simulated. Although this is not enough to accurately solve the global system of
constraints, it is sufficient to make motion seem natural. The nice thing about this scheme is
that inaccuracies do not accumulate or persist visually in the system causing object drift or the
like – in some sense the combination of projection and the Verlet scheme manages to distribute
complex calculations over several frames (other schemes have to use further stabilization
techniques, like Baumgarte stabilization). Fortunately, the inaccuracies are smallest or even
nonexistent when there is little motion and greatest when there is heavy motion – this is nice
since fast or complex motion somewhat masks small inaccuracies for the human eye.
A kind of soft bodies can also be implemented by using ‘soft’ constraints, i.e., constraints that
are allowed to have only a certain percentage of the deviation ‘repaired’ each frame (i.e., if the
rest length of a stick between two particles is 100 but the actual distance is 60, the relaxation
code could first set the distance to 80 instead of 100, next frame 90, 95, 97.5 etc.).
As mentioned, we have purposefully refrained from using heavy mathematical notation in order
to reach an audience with a broader background. This means that even though the methods
presented are firmly based mathematically, their origins may appear somewhat vague or even
magical.
For the mathematically inclined, however, what we are doing is actually a sort of time-stepping
approach to solving differential inclusions (a variant of differential equations) using a simple
sort of interior-point algorithm (see [Stewart] where a similar approach is discussed). When
trying to satisfy the constraints, we are actually projecting the system state onto the manifold
described by the constraints. This, in turn, is done by solving a system of linear equations. The
linear equations or code to solve the constraints can be obtained by deriving the Jacobian of the
constraint functions. In this article, relaxation has been discussed as an implicit way of solving
the system. Although we haven’t touched the subject here, it is sometimes useful to change the
relaxation coefficient or even to use over-relaxation (see [Press] for an explanation). Since
relaxation solvers sometimes converge slowly, one might also choose to explicitly construct the
equation system and use other methods to solve it (for example a sparse matrix conjugate
gradient descent solver with preconditioning using the results from the previous frame (thereby
utilizing coherence)).
Note that the Verlet integrator scheme exists in a number of variants, e.g., the Leapfrog
integrator and the velocity Verlet integrator. Accuracy might be improved by using these.
Singularities (divisions by zero usually brought about by coinciding particles) can be handled by
slightly dislocating particles at random.
As an optimization, bodies should time out when they have fallen to rest. To toy with the
animation system for dead characters in Hitman: Codename 47, open the Hitman.ini file and
add the two lines “enableconsole 1” and “consolecmd ip_debug 1” at the bottom. Pointing the
cursor at an enemy and pressing shift+F9 will cause a small bomb to explode in his vicinity
sending him flying. Press K to toggle free-cam mode (camera is controlled by cursor keys, shift,
and ctrl).
Note that since all operations basically take place on the particle level, the algorithms should be
very suitable for vector processing (Playstation 2 for example).
Conclusion
This paper has described how a physics system was implemented in Hitman. The underlying
philosophy of combining iterative methods with a stable integrator has proven to be successful
and useful for implementation in computer games. Most notably, the unified particle-based
framework, which handles both collisions and contact, and the ability to trade off speed vs.
accuracy without accumulating visually obvious errors are powerful features. Naturally, there
are still many specifics that can be improved upon. In particular, the tetrahedron model for rigid
bodies needs some work. This is in the works.
At IO Interactive, we have recently done some experiments with interactive water and gas
simulation using the full Navier-Stokes equations. We are currently looking into applying
techniques similar to the ones demonstrated in this paper in the hope to produce faster and
more stable water simulation.
Acknowledgements
The author wishes to thank Jeroen Wagenaar for fruitful discussions and the entire crew at IO
Interactive for cooperation and for producing such a great working environment.
References
[Baraff] Baraff, David, Dynamic Simulation of Non-Penetrating Rigid Bodies, Ph.D. thesis, Dept.
of Computer Science, Cornell University, 1992.
http://www.cs.cmu.edu/~baraff/papers/index.html
[Mirtich] Mirtich, Brian V., Impulse-base Dynamic Simulation of Rigid Body Systems, Ph.D.
thesis, University of California at Berkeley, 1996.
http://www.merl.com/people/mirtich/papers/thesis/thesis.html
[Press] Press, William H. et al, Numerical Recipes, Cambridge University Press, 1993.
http://www.nr.com/nronline_switcher.html
[Stewart] Stewart, D. E., and J. C. Trinkle, “An Implicit Time-Stepping Scheme for Rigid Body
Dynamics with Inelastic Collisions and Coulomb Friction”, International Journal of Numerical
Methods in Engineering, to appear.
http://www.cs.tamu.edu/faculty/trink/Papers/ijnmeStewTrink.ps
[Witkin] Witkin, Andrew and David Baraff, "Physically Based Modeling: Principles and Practice",
Siggraph ’97 course notes, 1997.
http://www.cs.cmu.edu/~baraff/sigcourse/index.html
________________________________________________________
By Brian Hawkins
Gamasutra
January 10, 2003
URL: http://www.gamasutra.com/features/20030110/hawkins_01.htm
Once upon a time, it was a death wish for a game to be based on a movie license. However,
things have changed considerably in recent years. There have been a number of well done and
successful game titles based on movies, and on the flip side there have been several movies
released that had games as their origin. With the crossover between movies and games finally
starting to show some success, it is time to revisit how Hollywood can actually be helpful to the
game industry.
In the past century, motion pictures have developed a visual language that enhances the
storytelling experience. Equally important, audiences have grown accustomed to certain
conventions used to tell these visual stories. Unfortunately, very little of this knowledge has
been translated for use in interactive storytelling.
Last month, in Part One of this two-part series, we looked at how to describe a cinematic
camera shot in general terms so that it could be automatically converted to camera position and
orientation within the game. To conclude, this month’s article brings it all together by
presenting a system that can choose the best shots and connect them together. Once finished,
these concepts can be joined to form a complete basis for a cinematic experience that improves
the interactive storytelling of games by giving players access to the action within a game in
ways that make sense to them instinctively.
Film Crew
Major motion pictures are made by hundreds of different people all working together in a huge
team effort. To transfer the cinematic experience to the world of games, we can take certain
established, key roles from the film industry and translate them into entities in the computer.
Objects in object-oriented languages such as C++ can conveniently represent these entities. In
this article, we will look at the three primary roles and describe their responsibilities as objects.
From this, you can build architectures to coordinate real-time cinematic camera displays. Before
going into detail about each role, let’s take a brief look at each in turn.
The first job belongs to the director. In films, the director controls the scene and actors to
achieve the desired camera shots that will then be edited later. However, because our director
object will have little or no control over the game world, this responsibility shifts to determining
where good camera shots are available and how to take advantage of them.
Once these possibilities are collected, they are passed on to the editor who must decide which
shots to use. Unlike in motion pictures, however, the editor object must do this in real time as
each previous shot comes to an end. The editor is also responsible for choosing how to
transition between shots.
Finally, once the shot and transition have been decided upon, it becomes the cinematographer
object’s task to transform that information into actual camera position and movement within
the game world. With this basic idea of how all the roles and responsibilities fit together, we can
move on to a closer look at each individual role.
As mentioned previously, the director’s role in the game world is to collect information on
available shots and their suitability for inclusion in the final display. This is the one place where
human intervention is necessary, after which no more human input is necessary. It is currently
impossible to create a system sophisticated enough to determine the priority of events within
the game world from a creative standpoint.
Instead, programmers and scripters are given the ability to provide information about priority
and layout of interesting events, hence the term used in this article — event-driven cinematic
camera, through a suggestShot method on the director object. This information will then be
used by the editor for a final decision on which shots to include. Following is a breakdown of the
information necessary to make these decisions.
The first and most important piece of information is the priority of the shot. The priority
represents how interesting a particular shot is compared to other shots available at the time.
Thus the value of priority is relative, which means there is no definitive meaning for any
particular number. You must therefore be careful to remain consistent within a single game in
order to give the priority levels meaning. For example, all other values being equal, a shot with
a priority of two is twice as interesting as a shot with a priority of one.
The second piece of information required is the timing of the shot. Timing is the most complex
part of the editing process, and the sooner an event can be predicted, the better choices the
editor can make. Timing breaks down into four values: start time, estimated length, decay rate,
and end condition. The start time is obviously the beginning of the event. The estimated length
is a best guess at how long the shot will last. The decay rate determines how quickly the
priority decays once the event begins. Finally, the end condition determines when the shot
should be terminated. Let’s look at decay rate and end conditions in more detail.
The decay rate is used to determine the actual priority at a given time t using the starting
priority p and a constant, k. The constant is provided as part of the decay rate information,
since it will differ from shot to shot. The other information for decay rate is the equation to use
for determining the actual priority. For maximum flexibility, this should be a function object that
takes t, p, k, and the start time, ts, and returns the priority for that time. Two useful functions
that should be predefined as function objects for this parameter are:
These functions should suffice for most circumstances. Notice that the second equation cubes
the value rather than squaring it. This is important, because it ensures that the priority remains
negative after a certain amount of time has passed, whereas squaring would have caused the
result to always remain positive. Figure 1 shows the resulting graphs of these functions as a
visual aid for understanding how decay rate affects priority.
Figure 1. Decay rate graph, showing how decay rate
affects shot priority.
The end condition is best specified as a function object that returns one of three values. The
first value indicates the shot cannot be terminated yet, the second value indicates the shot can
be terminated if another shot is ready, and the third value indicates that the shot must be
terminated. The reason for the middle value is that it gives the editor more flexibility in
choosing shots by allowing a choice of new shots within a certain time, rather than
instantaneously when the shot is terminated.
Next comes the shot information. This is all the information needed by the cinematographer to
change the shot from a suggestion into a real in-game shot. This includes information such as
the primary actor and secondary actor, if any. In addition, the shot size, emphasis, angle, and
height may be necessary. Refer to last month’s article for more information on determining this
information as well as the following scene information.
The scene information consists of the actors within the given scene and the current line of
action for that scene. Unfortunately, scene information can change dynamically as actors move
around and the cinematographer changes the line of action. Because of this fact, it is best to
store the scene as a reference through the primary actor of the shot that is being suggested.
The director’s other responsibilities are to provide the editor with a list of currently available
shots at any time and to ensure that this list is up-to-date. Keeping the list up-to-date primarily
involves removing shots that are no longer valid. A shot becomes invalid when the priority
modified by decay rate, as discussed previously, falls below zero. Once the editor chooses a
shot, it is also removed from the list of shots. This brings us to a discussion of how the editor
chooses a shot.
The editor is responsible for choosing the next shot that will be shown as well as any transitions
between shots. First, let’s look at the process of choosing the next shot. The majority of the
information needed is provided with the shot suggestions from the director, but there are
parameters that can be used to give each editor its own style. The two parameters involved in
shot decisions are the desired shot length, lshot, and the desired scene length, lscene. By setting
these for different editors, the shots chosen will vary for the same set of circumstances. For
example, one editor could prefer short scenes filled with one or two long shots by setting the
shot time and the scene time to be relatively close values. On the other hand, another editor
could prefer longer scenes filled with short shots. This provides a number of options when
choosing an editor for a particular situation.
The time for choosing the next shot is determined by a change in the return value of the end
condition for the current shot. Once the current shot indicates that it can be terminated, the
editor must obtain the list of currently available shots from the director. From this list, the
editor object then filters out any shots whose start time is too far in the future. If the end
condition requires immediate termination, this excludes all shot suggestions whose start time is
not the current time or whose start time has not already passed. Otherwise, all shots whose
start time is no more than lshot beyond the current time are considered.
To choose the shot from this list, we must sort them based on a value that represents the
quality of each shot suggestion and then take the shot with the highest value. Before we can
compute this value, we need to introduce a few other values that will be used in its calculation.
First, we consider the desired shot length versus the estimated shot length, lestimated:
Then we look to see if the actors have any relation to those in the last shot:
Next, we check to see if the new scene matches the old scene. For this the editor must also
keep track of the time spent in the current scene, tscene:
Finally, the priority is modified by the decay rate discussed earlier if the shot event has already
commenced:
Once we have all this information, we can compute the quality value of each shot on the list:
Notice that the values cactor and cscene allow us to maintain consistency for our shots. This is a
very important property of good film directing and editing and should not be overlooked in
interactive cinematography, even though it is more difficult to maintain.
You may have also noticed that when calculating pω(t) that t can be before ts, thus it is possible
under some circumstances to choose a shot that has not started yet. In this case, we hold on to
the shot and wait for one of two events: either the shot start time occurs or the end condition
of the current shot forces us to terminate. Upon the occurrence of either event, we must once
again test to see which is the best shot, in case a better shot has come along or we are being
forced to move on before the shot we would like to display can start.
Now that an ordering exists that allows us to choose the next shot, the only remaining choice
necessary is the transition from the current shot to the new shot. If we are transitioning
between different scenes, the choice is easy, a cut or fade should be used. However, if the
transition is between two shots in the same scene, the logic becomes slightly more complex.
Within a scene it is important to maintain the line of action; in other words, to keep the camera
on one side of a plane defined for the scene so as not to confuse the viewer’s perception of
scene orientation.
Let’s consider the various permutations that can occur between shots and what type of
transition should be used. For now, we will break them into fading (think of cutting as an
instantaneous fade) and camera movement.We will go into more detail on moving the camera
later. First, if the actors remain the same between the shots, then we can preserve the line of
action and use a fade. Likewise, even if the actors change but the new line of action lies on the
same side of the line of action as the previous camera position, then a fade can still be used.
However, if the two lines of action differ significantly, then a camera move needs to be
performed. The camera move should allow the line of action to change without confusing the
viewer. To get a rough approximation of the distance the camera must travel, compare the
distances between the current and new camera positions and the current and new focal points.
Now compute how fast the camera must move to travel that distance in the time it would take
for the new shot to become uninteresting:
Where ∆c is the vector between camera positions, ∆f is the vector between focal points, and p(t)
is the priority decay formula for the shot.
If the camera move cannot be made at a reasonable speed, then a new shot must be chosen,
unless the actors from the last shot would not be visible in the pending shot. Otherwise, a new
shot should be chosen with preference for close-ups that include only one actor, thus making
the next transition easier. We can now move on to realizing the shot and transition. For the
decay formulas given earlier, t would be tstart + 1/k.
Last month, we covered the math necessary to turn a description of a shot into actual camera
position and orientation. This month, we will build on that and flesh out the role of the
cinematographer by covering the handling of transitions.
The simplest transition is the cut, where we only need to change the camera position and
orientation to a new position and orientation. Only slightly more complex is the fade, which
provides a two-dimensional visual effect between two camera views. When fading, it is
important to decide whether to fade between two static images or allow the world to continue
to simulate while the fade occurs. Allowing continued simulation implies rendering two scenes
per frame but eliminates the need for pauses in gameplay. If you are able to handle the extra
rendering, interesting fade patterns can be achieved by using complementary masks when
rendering each scene. Depending on the hardware available for rendering, you may only be
able to do black and white masks, or you could go all the way to alpha-value masks.
Figure 2. Shot transition criteria, where re is the radius of
acceptable error.
The other group of transitions involves moving the camera. The three transitions we will
consider are pan, zoom, and crane. The decision of which move to make depends on the
camera and focal positions for the two shots. Figure 2 shows the various situations that lead to
the choice of a particular shot. The pan is used if the camera is in approximately the same
location for both shots and only the focal point moves. Though this happens rarely in an
interactive environment, when it does happen the old camera position can be kept and only the
orientation needs to be animated to the new orientation. Similarly, the conditions for zooming
are fairly uncommon, as both the camera positions and focal points must lie close to the same
line, but when it does occur the camera field-of-view can be used to allow a much more
interesting transition than a simple camera move.
Finally, we come to the most complex transition, the crane. The best method for creating a
crane move is often by borrowing the services of the AI’s path-planning algorithm in order to
avoid moving the camera through objects. It is best if the path planning also handles
orientation, as this will lead to better results than interpolating between the focal points.
Unfortunately, getting crane shots to look their best is a complex process for which this is only
a starting point. If you do not have the time to invest in making them work, you may wish to
leave them out altogether.
You now have enough information to create your own basic cinematic system to include in your
game. There is plenty of room to go beyond this basic system. Research on some of these
areas has already been conducted in academic circles. For instance, events that involve
conversations between characters could be specified as a single suggestion rather than
manually suggesting each individual shot during the discourse. “The Virtual Cinematographer”
and “Real-time Cinematic Camera Control for Interactive Narratives” (see For More Information)
describe how director styles can be created to specify camera shots automatically for these
situations. This reduces human involvement, which is always important as it allows other
features to be added to the game.
Another important aspect of cinematography that is only now becoming possible with the power
of newer graphics hardware is depth-of-field. This is often used as a mechanism to draw
attention to various elements in a scene of a film. As rendering of depth-of-field becomes more
common, it will be important to develop controls for it that are based on the principles learned
from cinematography. It is even possible to extend the concept of depth-of-field in ways that
would be difficult in real-world filmmaking. “Semantic Depth of Field” in For More Information
talks about selective application of depth-of-field effects on important elements of an image.
As you can see, there is a wealth of information out there and plenty of room for
experimentation and new ideas. As games continue to grow in popularity, they must meet the
demands of the more general audience that is used to the conventions of films. There is much
to do in order to reach this goal and continue to expand the scope of game development.
Continued innovation and experimentation in this area will bring out greater variety of
expression on the part of game developers, and richer, more compelling game experiences for
players.
Amerson, Daniel, and Shaun Kime. “Real-time Cinematic Camera Control for Interactive
Narratives.” American Association for Artificial Intelligence, 2000. pp. 1–4.
Arijon, Daniel. Grammar of the Film Language. Los Angeles: Silman-James Press, 1976.
He, Li-wei, Michael F. Cohen, and David H. Salesin. “The Virtual Cinematographer: A Paradigm
for Automatic Real-Time Camera Control and Directing.” Proceedings of SIGGRAPH 1996. pp.
217–224.
Katz, Steven D. Film Directing Shot by Shot. Studio City, Calif.: Michael Wiese Productions,
1991.
Kosara, Robert, Silvia Miksch, and Helwig Hauser. “Semantic Depth of Field.” Proceedings of the
IEEE Symposium on Information Visualization 2001.
Lander, Jeff. “Lights… Camera… Let’s Have Some Action Already!” Graphic Content, Game
Developer vol. 7, no. 4 (April 2000): pp. 15–20.
By Brian Hawkins
Gamasutra
January 8, 2003
URL: http://www.gamasutra.com/features/20030108/hawkins_01.htm
In the beginning, Auguste and Louis Lumiere made their first short movies without changing the
position of the camera. It would take others, such as the magician Georges Melies, to stop and
move the camera to create more dynamic scenes than could be done with a stationary,
continuously running camera. Over the next century, cinematographers and editors have
learned the best ways to film, cut, and transition between different shots to make the movie
experience larger than life.
When it comes to camera setup, the dynamic nature of games puts us back to the days of
Lumiere. With minimal or no control of the placement of the actors in a game, camera shots are
often set as stationary or dragged behind one of the actors. Here and there, we see games that
try innovative camera techniques — some that work and others that do not. One way to
improve the chances of success is to take the film industry’s century of experience and adapt it
to our industry.
Since the position and orientation of the actors is not known ahead of time, a method of
describing the shot without exact positioning information is necessary. In order to arrive at a
suitable set of parameters, let’s look at the basic rules and descriptions that a cinematographer
follows when setting up a camera shot. We are not searching for a complete description of how
the shot is set up, but just for enough information from which to establish a position and
orientation for the camera. For example, while we may eventually wish to implement different
camera filters for interesting effects, what we care about now is mainly the field-of-view for the
camera.
The most important rule of cinematography is called the 180-degree rule: The camera should
not cross the line of action. The line of action is an imaginary line that partitions a scene into
two distinct areas, usually going through the main actors or in the direction of movement. By
not crossing the line between shots, a scene’s screen direction and space is preserved. This idea
can be extended to account for the three-dimensional nature of certain games by using a plane
to partition space rather than just a line. This information is constant across all or several of the
shots in a scene.
Figure 1. Common triangular
camera arrangements: (a)
standard, (b) over-the-
shoulder, (c) point of view, and
(d) profile.
Once the line of action has been established, camera placement within the valid area follows
what is known as the triangle system. Figure 1 shows the four basic camera positions for two
actors: standard, over-the-shoulder, point of view, and profile. Each of these layouts has the
same camera at the top point of the triangle, which is used for the establishing (or master)
shot. The remaining cameras are meant to favor one actor over another. Figure 1 shows the
approximate locations of the cameras on a two-dimensional plane, leaving the need for another
parameter to describe the height or vertical angle of the camera.
Once the line of action has been established, camera placement within the valid area follows
what is known as the triangle system. Figure 1 shows the four basic camera positions for two
actors: standard, over-the-shoulder, point of view, and profile. Each of these layouts has the
same camera at the top point of the triangle, which is used for the establishing (or master)
shot. The remaining cameras are meant to favor one actor over another. Figure 1 shows the
approximate locations of the cameras on a two-dimensional plane, leaving the need for another
parameter to describe the height or vertical angle of the camera.
Figure 2. Visible portion of human actor for
various shot sizes.
Two remaining parameters involve the framing of the actor or actors on the screen. The first of
these parameters is the shot size, which will determine the distance of the camera from the
actor. Figure 2 shows the visible portions for a human actor in the common shot sizes. In
addition to these shots, the long shot frames the actor with plenty of space between the edge of
the screen and the actor. The other parameter determines the emphasis given to one actor or
another, or the approximate screen space taken up by each actor. This information should be
sufficient for us to move on to laying out the actual data structure and working out the
equations for determining the final position and orientation. From here I’ll assume a basic
knowledge of vector math, otherwise, you may wish to refer to Eric Lengyel’s Mathematics for
3D Game Programming and Computer Graphics (Charles River Media, 2002) or other similar
graphics text.
The first data structure we need is the scene, which persists across several individual shots. In
a scene, we are primarily concerned with the actors and therefore need a list of the scene’s
actors. The other important part of a scene is the line of action, discussed earlier, which will be
stored as a matrix for reasons I will discuss shortly. This means a scene will store data similar
to:
class scene
{
// ...
list<actor> actors;
matrix line_of_action;
};
Let’s take a closer look at the actor data and see what we will need to know about each actor.
To simplify framing of the actor, we will consider only bounding spheres. However, we do need
to use two spheres to model a human actor properly. The first sphere encompasses the entire
actor, while the second surrounds only the head, or some other significant area for nonhuman
actors. Finally, the orientation of the actor is important and should be included in the final data
structure similar to this:
class actor
{
// ...
matrix orientation;
sphere body;
sphere head;
};
Now we turn to computing the line of action. As discussed earlier, we can consider this more a
plane of action and therefore conveniently store it as a matrix. Although it’s possible to store it
more succinctly, the full matrix provides convenient directional vectors and is a standard data
structure used in most games. The exact line of action depends on the number of actors in the
scene; here I’ll be considering either one or two actors. For one actor the line of action is the
vector in the direction the actor is facing, and for two actors the line of action goes from the
primary actor to the secondary actor. Use these as a right-vector for the matrix, and form the
rest of the matrix using the average up-vector of the actors as a reference:
Emphasis
The first parameter we need to look at in the individual shot is the emphasis, because it will
affect several of our following calculations. However, before we handle the emphasis we must
choose what we want to emphasize. For our purposes, I’ll assume that you can focus on the
primary actor, secondary actor, or both. The focus is the center of the selected actor, which I
will describe how to calculate when I talk about size. There are two ways we could handle
focusing on both actors, the simpler of which is to use the midpoint between the centers. Later,
I will discuss a slightly more complex approach that is particularly useful for over-the-shoulder
camera shots.
The emphasis chosen specifies where on screen we wish to place the focus. Full-screen
emphasis would place the focus in the center of the screen, two-thirds emphasis would place it
at one-third of the way across the screen, and half emphasis would place it at one-quarter of
the way across the screen. With that in mind, we need to convert the emphasis specification to
values we can use in our calculations.
Full 0 0 1
Two-Thirds 1/6 FOV 1/2 FOV sec(1/6 FOV)
One-Half 1/4 FOV 1/2 FOV sec(1/4 FOV)
The three important values are the angle formed between the ray to the emphasis and the
center ray, the angle between the emphasis of the main actor and the emphasis of the second
actor (if any), and the distance from the camera to the intersection of the emphasis ray with
the view frustum plane one unit from the camera. Table 1 shows the three main emphasis
types and their respective values. With these, we are ready to move on to some actual
calculations, starting with the camera distance from the focus based on desired size.
Head/Body Screen
Shot Size Interpolation (Khb) Interpolation (Ks)
Extreme Close-up 0 3/2
Close-up 0 1
Close Shot 1/4 1
Medium Close Shot 1/3 1
Medium Shot 1/2 1
Medium Full Shot 3/4 1
Full Shot 1 1
Long Shot 1 1/2
The first step to computing the distance from the focus is to determine the part of the actor to
show. At the same time, we can compute the exact location of the focus. This location is the
same as the center of the actor mentioned earlier. To do all this, we bring the head and body
spheres into play by interpolating between them based on the desired shot size. Table 2 shows
the interpolation percentages for several common shot sizes. Take these and plug them into the
following equations:
This gives the desired center, or focus, and radius values. Table 2 also gives the percentage of
the screen we wish the actor to occupy, which equates to the ratio of the radius to half the
screen height. Now we can form two similar right triangles, one using the radius as the far side
and one using half the height of the frustum view plane one unit from the camera. Respectively,
the adjacent sides are the distance we are seeking and the length value from Table 1. Since the
triangles are similar, we know the following ratios are equal:
Here, we can obtain h by taking the tangent of half the vertical field of view. Solving for the
distance we get:
Angle
We start with the line-of-action orientation matrix, MCO, as our initial camera matrix before the
angle changes are applied. Next, we need to determine the initial offset for the camera based
on the distance we just computed:
With this as a starting place, we can break the remainder of the angle changes down into three
separate rotations:
Now let’s go over how to compute each of these angles. For the first pass, we start with a
simple version of α used when we only care about ensuring that one of the actors is onscreen.
The angle is chosen based on the shot type we want from those shown in Figure 1, although
more shot types could be added. Table 3 shows the angles to use for each shot type.
The second angle, β, is the desired vertical angle from which to view the scene and should be
between –90 and 90 degrees. That leaves us with only θ to determine, which is based on the
emphasis angle, θe. The only modification is that we use –θe if we are looking at the primary
actor and θe if we are looking at the secondary actor.
Figure 3. Camera layout for a shot
framing both actors.
Finally, add voffset to the focus location to get the final camera position. The orientation of the
camera is MCO. That completes the basic shot. Let’s look now at an alternative for shooting two
actors that is more complex but produces a better shot.
Sometimes you want both actors in the shot, for which the preceding methods didn’t work. An
improved method for framing in this case only need modify how a is calculated when working
out the camera angles. Figure 3 is the desired result, and if we apply the law of sines to this,
we get:
Knowing that the angles in a triangle total 180 degrees, we get the following equation for α:
Handling Obstructions
The main problem that arises once a camera shot has been calculated is an obstructed view of
the actor or actors in the scene. There are a number of ways to handle this, of which I’ll
examine two of the most useful.
The first method is advantageous if the scene involves moving elements or a moving camera
and you wish to ensure the actors are visible throughout. Start by rendering the actors first,
storing the area of the screen and distance from the camera in which each actor is rendered.
Now render the rest of the scene, skipping any objects that overlap one of the actors closer to
the camera than that actor. Now go back and sort these actors from back to front and render
them as translucent. This technique can be fairly expensive, especially for longer shots and
complex scenes.
On the Move
Movement is a very important part of games, and we would be lax if we did not consider it.
First, we must consider how to handle moving actors. When we are focusing on only one actor,
a simple solution is to follow that actor. If we cut to a different camera angle, we must obey the
line of action at that time. However, it’s not a problem if the camera rotates with the actor as
long as the line of action is updated to reflect the new heading.
What happens when we are looking at two actors? There are several options, depending on the
circumstances. If the actors are both known to be traveling in the same direction, the camera
and the line of action can simply follow them as they did with only one actor. Another
possibility, which could also be used with a single actor, is to let them travel out of frame before
establishing a new camera shot and a new scene with a different line of action. This method
allows the actors to move around in the shot as long as they do not stray too far. Cutting to a
new scene is the only option if the actors travel in opposite directions, and the new scene will
likely not have both actors in it.
One final form of movement that should be mentioned is camera movement. Up until now, we
have treated shot changes as cuts, moving instantly from one camera position to the other. In
some cases, a better or more interesting transition can be achieved by moving the camera
smoothly from one position to the other. You could also use this to establish a new line of action
within the same scene, but this should be used rarely.
What’s Next?
I have covered the basics of setting up a shot using common cinematography techniques, but
there is much more that can be learned from the film industry. For example, I handled the most
common cases of one or two important actors in a scene. There are occasions when three or
more actors are important to a scene. See the For More Information section for additional
resources on cinematography.
Another aspect that I have not discussed is the camera filter, which can add certain effects and
moods to the scene. I dealt with one camera change, the field of view, because it was very
important to position. However, it is also possible to perform color filtering, depth of field, and
other more complex effects to add certain qualities to a shot. You should experiment with these
effects to see if you can add that extra touch to your game.
A more complex issue that is rarely addressed in games currently is cinematic lighting. This is
an extremely important part of filmmaking and can lend a professional touch to any scene.
While lighting principles are well known in the film industry, in the game industry they are
normally only applied to movies outside of the gameplay. Similar principles as those in this
article could be used to help automate some of the lighting decisions for games, adding extra
realism and production quality to a game.
However, the largest area I haven’t touched on yet is deciding what shot to use for a given
situation. With the addition of hints from programmers and level designers, it’s possible to
automate the process of shot selection. This involves deciding among available shots, picking
transitions, and changing scenes when appropriate. Next month I’ll take a closer look at this
important part of our work on automating interactive cameras.
Copyright © 2003 CMP Media Inc. All rights reserved.
Gama Network Presents:
URL: http://www.gamasutra.com/features/20021212/neller_01.htm
As computer gaming reaches ever-greater heights in realism, we can expect the complexity of
simulated dynamics to reach further as well. To populate such gaming environments with
agents that behave intelligently, there must be some means of reasoning about the
consequences of agent actions. Such ability to seek out the ramifications of various possible
action sequences, commonly called “lookahead”, is found in programs that play chess, but there
are special challenges that face game programmers who wish to apply AI search techniques to
complex continuous dynamical systems. In particular, the game programmer must “discretize”
the problem, that is, approximate the continuous problem as a discrete problem suitable for an
AI search algorithm.
As a concrete example, consider the problem of navigating a simulated submarine through a set
of static obstacles. This continuous problem has infinite possible states (e.g. submarine position
and velocity) and infinite possible trajectories. The standard approach to discretize the problem
is to define a graph of “waypoints” between which the submarine can easily travel. A simple
waypoint graph can be searched, but this approach is not without significant disadvantages.
First, the dynamics of such approximate navigation are not realistic. It’s still common to see
massive vehicles in computer games turn about instantly and maintain constant velocity at all
times. When considering acceleration in agent behavior, there’s a quick realization that the
notion of a “waypoint” becomes far more complex. For example, a vehicle with realistic physical
limitations cannot ignore momentum and turn a tight corner at any velocity. A generalized
waypoint for such a system would contain not only a position vector, but a velocity vector as
well, doubling the dimensions of the waypoint. If waypoint density is held constant, memory
requirements grow exponentially with the waypoint dimensions.
The second disadvantage is that relevant state can incorporate many factors beyond waypoints
in a dynamic environment. If the programmer wishes the submarine to pilot around moving
obstacles, state dimensionality is further increased along with an exponential increase of the
memory requirements for our state-based discretization.
Even so, action-based discretization is not trivial. In our submarine example, an action-based
approach might sample control parameters that affect positional and angular velocity. The
choice of the sample is not obvious, and crucial to the effectiveness of search. Additionally, the
programmer needs to choose good timing of control actions. If time intervals between actions
are too short/long, search is too shallow/deep in time and behavior is thus
shortsighted/inadequately responsive.
Action-Based Discretization
Artificial Intelligence search algorithms search discrete systems, yet we live and reason in a
continuous world. Continuous systems must first be discretized, i.e. approximated as discrete
systems, to apply such algorithms. There are two common ways that continuous search
problems are discretized: state-based discretization and action-based discretization. State-
based discretization becomes infeasible when the state space is highly dimensional. Action-
based discretization becomes infeasible when there are too many degrees of freedom.
Interestingly, biological high-degree-of-freedom systems are often governed by a much smaller
collection of motion primitives [Mataric, 2000]. We focus here on action-based discretization.
Action-based discretization consists of two parts: (1) action parameter discretization and (2)
action timing discretization, i.e. how and when to act. See Figure 1. The most popular form of
discretization is uniform discretization. It is common to sample possible actions and action
timings at fixed intervals.
Iterative-refinement algorithms use a simple means of dynamically adjusting the time interval
between search states. We will present the results of an empirical study of the performance of
different search algorithms as one varies the initial time interval between search states. We
formalize our generalization of search, describe the algorithms compared, present our chosen
class of test problems, and present the experimental results.
We will assume that the action parameter discretization, i.e. which action parameters are
sampled, is already given. From the perspective of the search algorithm, the action
discretization is static (cannot be varied by the algorithm). However, action timing discretization
is dynamic (can be varied by the algorithm). For this reason, we will call such searches "SADAT
searches" as they have Static Action and Dynamic Action Timing discretization.
SADAT searches are different than classical AI searches in only one respect. An action (i.e.
operator) additionally takes a time delay parameter indicating how much time will pass before
the next action is taken. For dynamical systems where timing is relevant, this is an important
generalization.
In fact, one could view this as having only three parts if you define the state space in terms of
the second and third items above for all possible time durations.
In classical search, a goal path can be specified as a sequence of actions that evolve the initial
state to a goal state. Now that the timing of actions is a choice, a goal path can be specified as
a sequence of action-duration pairs that evolve the initial state to a goal state. The cost of a
path is the sum of all transition costs. Given this generalization, the state space is generally
infinite, and the optimal path can generally only be approximated through a sampling of
possible paths through the state space.
Algorithms
The following algorithms are all written in Richard Korf’s style of pseudo-code. In an object-
oriented implementation, one would naturally have node objects. All of these algorithms simply
exit when a goal node is found. Since at any time, the call stack contains all relevant path
information for the current node, one could easily modify the algorithms to put node and action
information onto a stack while exiting, allowing easy retrieval of the solution path.
Iterative-Deepening A
All searches have exponential time complexity (O(bd) where b is breadth and d is depth of the
search tree.) Depth-first search has linear (O(d)) memory complexity, but does not necessarily
find optimal (or even good) solutions. Breadth-first search finds minimum depth solutions, but
does so with exponential memory cost. A* search (best-first search with an admissible
heuristic) uses a heuristic function to direct search and reduce the cost of search. However,
memory complexity is still exponential.
IDA* [Korf, 1985] provides a means of having linear memory complexity and optimality, at the
cost of node re-expansion. IDA* performs depth-first searches to successively greater f-value
bounds until a solution is found. The pseudo-code for IDA* is as follows:
IDASTAR (node : N)
B := f(N);
WHILE (TRUE)
B := IDASTARB(N, B)
If the recursive search is unsuccessful, it returns the lowest f-value encountered beyond the
current bound. The outermost loop then revises the bound to this value and searches to this
greater bound. The amount of node re-expansion is problematic if there are many distinct f-
values. Such is true of most real-valued search problems. This problem is addressed by the
epsilon-admissible variant described below.
∈-Admissible IDA*
∈-Admissible iterative-deepening A* search, here called ∈-IDA*, is a version of IDA* where the
f-cost limit is increased "by a fixed amount , on each iteration, so that the total number of
iterations is proportional to 1/∈. This can reduce the search cost, at the expense of returning
solutions that can be worse than optimal by at most ∈."[Russell & Norvig, 1995]
eIDASTAR (node : N)
B := f(N);
WHILE (TRUE)
B := MAX(IDASTARB(N, B), B+∈)
To make this point concrete, suppose the current iteration of ∈-IDA* has an f-cost limit of 1.0
and returns no solution and a new f-cost limit of 2.0. The new f-cost limit is the minimum
heuristic f-value of all nodes beyond the current search contour. Let us further assume that ∈ is
0.1. Then increasing the f-cost limit by this fixed ∈ will result in the useless search of the same
contour for 9 more iterations before the new node(s) beyond the contour are searched. In our
implementation above, the f-cost limit would instead increase directly to 2.0.
Recursive best-first search (RBFS) [Korf, 1993] is a significant improvement over IDA*. RBFS
also expands nodes in best-first order and has linear memory complexity. It also expands fewer
nodes than IDA* for nondecreasing cost functions. This is accomplished by some extra
bookkeeping concerning node re-expansion. Korf’s RBFS algorithm is as follows:
RBFS suffers from the same problem as IDA* when there are many distinct f-values. This
problem is addressed by the new epsilon-admissible variant of RBFS described below.
∈-Admissible RBFS
∈-Admissible recursive best-first search [Neller, 2000], here called ∈-RBFS, is a new ∈-
admissible variant of recursive best-first search [Korf, 1993]. As with our implementation of ∈-
IDA*, local search bounds increase by at least ∈ but possibly more as necessary to avoid
redundant search.
The difference between RBFS and ∈-RBFS is in the computation of the bound for the recursive
call. In RBFS, this is computed as MIN(B, F[2]) whereas in ∈-RBFS, this is computed as MIN
(B, MAX(F[2], F[1]+∈)). F[1] and F[2] are the lowest and second-lowest stored costs of the
children, respectively. Thus, the bound of the recursive call will not exceed that of its parent,
and will be the greater of the stored value of the lowest-cost sibling F[2] and its own stored
value F[1] plus ∈.
The algorithm’s initial call parameters are the root node r, f(r), and ∞. Actually, both RBFS and
∈-RBFS can be given a finite bound b if one wishes to restrict search for solutions with a cost of
no greater than b, and uses an admissible heuristic function. If no solution is found, the
algorithm will return the f-value of the minimum open search node beyond the search contour
of b.
In the context of SADAT search problems, both ∈-IDA* and ∈-RBFS assume a fixed time
interval ∆t between a node and its child. The following iterative-refinement algorithms do not.
Figure 2: Iterative-deepening and iterative-refinement depth-first search.
Iterative-Refinement
Iterative-deepening addresses our lack of knowledge concerning the proper depth of search.
Similarly, iterative-refinement addresses our lack of knowledge concerning the proper time
discretization of search. Iterative-deepening performs successive searches that grow
exponentially in time complexity. The complexity of previous unsuccessful iterations is generally
dominated by that of the final successful iteration. The same is true for iterative-refinement.
However, the concept of iterative-refinement is not limited to the use of depth-first search. In
general, for each iteration of an iterative-refinement search, a level of (perhaps adaptive) time-
discretization granularity is chosen for search and an upper bound on solution cost is given. If
the iteration finds a solution within this cost bound, the algorithm terminates with success.
Otherwise, a finer level of time-discretization granularity is chosen, and search is repeated.
Search is successively refined with respect to time granularity until a solution is found.
Iterative-Refinement ∈-RBFS
Iterative-Refinement ∈-RBFS does not search to a fixed time-horizon. Rather, each iteration
searches within a search contour bounded by B. Successive iterations search to the same
bound, but with finer temporal detail.
Iterative-Refinement DFS
Our depth-first search implementation DFS-NOUB uses a node ordering (NO) heuristic and has
a path cost upper bound (UB). The node-ordering heuristic is as usual: Nodes are expanded in
increasing order of f-value. Nodes are not expanded that exceed a given cost upper bound.
Assuming admissibility of the heuristic function h, no solutions within the cost upper bound will
be pruned from search.
Since SADAT search algorithms will generally only be able to approximate optimal solutions, it
is helpful to test them on problems with known optimal solutions. Richard Korf proposed the
problem of navigation between two points on the surface of a sphere as a simple benchmark
with a known optimal solution. Our version of the problem is given here.
The shortest path between two points on a sphere is along the great-circle path. Consider the
circle formed by the intersection of a sphere and a plane through two points on the surface of
the sphere and the center of the sphere. The great-circle path between the two points is the
shorter part of this circle between the two points. The great-circle distance is the length of this
path.
Our state space is the set of all positions and headings on the surface of a unit sphere along
with all nonnegative time durations for travel. Essentially, we encode path cost (i.e. time) in the
state in order to define the goal states. The initial state is arbitrarily chosen to have position
(1,0,0) and velocity (0,1,0) in spherical coordinates, with no time elapsed initially.
The action ai, 0<i<7 takes a state and time duration, and returns a new state and the same
time duration (i.e. cost = time). The new state is the result of changing the heading i*π/4
radians and traveling with unit velocity at that heading on the surface of the unit sphere. If the
position reaches a goal state, the system stops evolving (and incurring cost).
The set of goal states includes all states that are both (1) within ∈d great-circle distance from a
given position pg, and (2) within ,t time units of the optimal duration to reach such positions.
Put differently, the first requirement defines the size and location of the destination, and the
second requirement defines how directly the destination must be reached. Position pg is chosen
at random from all possible positions on the unit sphere with all positions being equally
probable.
If d is the great-circle distance between (1,0,0) and pg, then the optimal time to reach a goal
position at unit velocity is d - ∈d. Then the solution cost upper bound is d - ∈d + ∈t.
Experimental Results
In these experiments, we vary only the initial time delay ∆t between search states and observe
the performance of the algorithms we have described. For ∈-IDA* and ∈-RBFS, the initial ∆t is
the only ∆t for search. The iterative-refinement algorithms search using the harmonic
refinement sequence ∆t, ∆t/2, ∆t/3, …, and are limited to 1000 refinement iterations.
Experimental results for success rates of search are summarized in Figure 3. Each point
represents 500 trials over a fixed, random set of sphere navigation problems with ∈d = .0001
and ,t computed as 10 percent of the optimal time. Thus, the target size for each problem is the
same, but the varying requirement for solution quality means that different delays will be
appropriate for different search problems. Search was terminated after 10 seconds, so the
success rate is the fraction of time a solution was found within this allotted time.
In this empirical study, means and 90 percent confidence intervals for the means were
computed with 10000 bootstrap resamples.
Let us first compare the performance of iterative-refinement (IR) ∈-RBFS and ∈-RBFS. To the
left of the graph, where the initial ∆t0 is small, there is no difference between the two
algorithms. This region of the graph indicates conditions under which a solution is found within
10 seconds on the first iteration or not at all. There is no iterative-refinement in this region; the
time complexity of the first iteration leaves no time for another.
At about ∆t0 = .1, we observe that IR ∈-RBFS begins to have a significantly greater success
rate than ∈-RBFS. At this point, the time complexity of search allows for multiple iterations, and
thus we begin to see the benefits of iterative-refinement.
Continuing to the right with greater initial ∆t0, IR ∈-RBFS peaks at a 100 percent success rate.
At this point, the distribution of ∆t's over different iterations allows IR ∈-RBFS to reliably find a
solution within the time constraints. We can see the distribution of ∆t's that most likely yield
solutions from the behavior of ∈-RBFS.
Where the success rate of IR ∈-RBFS begins to fall, the distribution of first 1000 ∆t's begins to
fall outside of the region where solutions can be found. With our refinement limit of 1000, the
last iteration uses a minimal ∆t = ∆t0/1000. The highest ∆t0 trials fail not because time runs
out. Rather, the iteration limit is reached. However, even with a greater refinement limit, we
would eventually reach a ∆t0 where the iterative search cost incurred on the way to the good ∆t
range would exceed 10 seconds.
Comparing IR ∈-RBFS with IR DFS, we first note that there is little difference between the two
for large ∆t0. For 3.16<∆t0<100, the two algorithms are almost always able to perform
complete searches of the same search contours through all iterations up to the first iteration
with a solution path. The largest statistical difference occurs at ∆t0 = 316 where IR DFS's
success rate is 4.4 percent higher. We note that our implementation of IR DFS has a faster
node-expansion rate, and that ∈-RBFS's ∈-admissibility necessitates significant node re-
expansion. For these ∆t0's, the use of IR DFS trades off ,-optimality for speed and a slightly
higher success rate.
For low-to-mid-range ∆t0 values, however, we begin to see the efficiency of ∈-RBFS over DFS
with node ordering as the first iteration with a solution path presents a more computationally
costly search. Since the target destination is so small, the route that actually leads through the
target destination is not necessarily the most direct route. Without a perfect heuristic where
complex search is necessary, ∈-RBFS shows its strength relative to DFS. Rarely will problems
be so unconstrained and offer such an easy heuristic as this benchmark problem, so IR ∈-RBFS
will be generally be better suited for all but the simplest search problems.
Comparing IR ∈-RBFS with ∈-IDA*, we note that ∈-IDA* performs relatively poorly over all ∆t0.
What is particularly interesting is the performance of ∈-IDA* over the range where IR ∈-RBFS
behaves as ∈-RBFS, i.e. where no iterative-refinement takes place. Here we have empirical
confirmation of the significant efficiency of ∈-RBFS over ∈-IDA*.
In summary, iterative-refinement algorithms are statistically the same as or superior the other
searches over the range of ∆t0 values tested. IR ∈-RBFS offers the greatest average success
rate across all ∆t0. With respect to ∈-RBFS, IR ∈-RBFS offers significantly better performance
for ∆t0 spanning more than four orders of magnitude. These findings are in agreement with
previous empirical studies concerning a submarine detection avoidance problem [Neller, 2000].
This is significant for search problems where reasonable values for ∆t are unknown. This is also
significant for search problems where reasonable values for ∆t are known and one wishes to
find a solution more quickly and reliably. This performance comes at a reasonable price for
many applications. Lack of knowledge of a good time discretization is compensated for by
knowledge of a suitable solution cost upper bound.
Having looked at some methods for performing dynamic action timing discretization, we will
now focus on dynamic action parameter discretization. Now we will assume that the action
timing discretization, i.e. when actions are taken, is already given. From the perspective of the
search algorithm, the action timing discretization is static (cannot be varied by the algorithm).
However, the action parameter discretization is dynamic (can be varied by the algorithm). For
this reason, we will call such searches "DASAT searches" as they have Dynamic Action and
Static Action Timing discretization.
DASAT are different than classical AI searches in only one respect. There are infinite ranges of
action parameters. In the context of navigation, the choice of a heading change can come from
an infinite continuum of angle choices 0 - 2π. For dynamical systems where the choice of action
parameters is relevant, this is an important generalization.
A DASAT search problem is made up of four parts:
Given this generalization, the state space is generally infinite, and the optimal path can
generally only be approximated through a sampling of possible paths through the state space.
When both action parameter and action timing discretizations are dynamic, we call such
searches “DADAT” searches. The following experiments were performed with DADAT searches
using a form of iterative-refinement depth-first search and three different forms of action
parameter discretization.
The Submarine Channel Problem is not unlike a Sega videogame of the 1980's called Frogger. A
submarine seeks a path through a channel such that it avoids being detected by a number of
patrolling ships. We have chosen this problem because, like the n2-1 sliding tile puzzles, it can
serve as a benchmark easily scalable to greater difficulty.
In the Submarine Channel Problem, the submarine starts at position (x, y) = (0, 0) with
eastward heading and at full stop. To the east along an east-west channel of width w (centered
along y=0) are n ships patrolling across the width of the channel. This is pictured in Figure 4.
Each ship j has an inner detection radius ri,j and an outer detection radius ro,j. Within a
proximity of ri,j, ship j will detect the submarine and the submarine will be penalized with a
detection penalty. Within a proximity of ro,j and beyond ri,j, the submarine incurs a proximity
penalty scaling linearly from 0 at the outer radius to the full detection penalty at the inner
radius. Beyond the outer radius, there is no penalty. If the submarine collides with the sides of
the channel, there is a collision penalty. In the case of collision or detection, the submarine is
halted and allowed no further legal moves. The first ship patrols at an x-offset xOffset1 = ro,1.
Each ship k thereafter has xOffsetk = xOffsetk-1 + 3ri,k-1 + ri,k. Ship k has a patrolling route
defined by cycling linearly between the following points: (xOffsetk, w/2-ri,k), (xOffsetk+2ri,k,
w/2-ri,k ), (xOffsetk +2ri,k, -w/2+ri,k), and (xOffsetk, -w/2 + ri,k). Each ship begins at a given
percentage along this cycle. For n ships, the goal states are all states within the channel with x
> xOffsetn + 2ri,n + ro,n, i.e. all channel points to the right of the rightmost outer detection
radius.
The submarine can travel in eight headings (multiples of B/4 radians), and three speeds: full
speed, half speed, and full stop. Together these define 17 distinct actions the submarine can
take at any point which it has incurred neither collision nor full detection penalty. (Since we
assume discrete, instantaneous changes to headings and speeds, all full stop actions are
effectively equivalent.) Each ship travels at a single predefined speed.
Generalizing the submarine channel problem for DADAT search, we allow any heading and any
speed up to the maximum. Thus an action, i.e. changing heading and speed, can be thought of
as picking a point in a circular region with the radius being the maximum speed. The center
point is a full stop, and any other point indicates a heading and speed (in polar coordinates).
Faced with this freedom of choice in our search algorithms, we present three ways of
performing dynamic action parameter discretization. First, we can randomly choose parameters
with independent uniform distributions over headings and speeds. Second, we can take a fixed
uniform discretization as described above and rotate it by a random angle. Third, we can seek
to generate a discretization with action parameters as “far” from each other as possible. We call
this last technique “dispersed discretization”.
The basic idea of “dispersed” discretization is to take a number of randomly sampled points
from the action region and simulate them as if they were point charges mutually repelling each
other with force proportional to the inverse square of their distance. The point dispersion
algorithm pseudo-code is as follows:
We used a repulsion factor of 0.008 and a repulsion factor decay of 0.93 for 20 iterations.
These values were chosen empirically based on a small number of trials with the submarine
action region. In future work, we would desire these dispersion parameters to be rapidly self-
adapting to the size of the region and the number of sampled points.
Experimental Results
For these experiments, we have chosen w=1 length unit. The outer radius of each ship is 0.2w.
The inner radius of each ship is 0.1w. The maximum velocity of the submarine is w/(1 time
unit). All ship velocities are also w/(1 time unit). Ships are started at random percentages
through their patrol cycles. The detection and collision penalties are set at 10000. In each
experimental trial we generated a random 10-ship submarine channel problem. A successful
trial found a solution within 10 seconds of search. For each initial time delay ∆t we ran 100
trials.
Figure 5 summarizes experimental results comparing the performance of random, uniform, and
dispersed discretization techniques used with a form of iterative-refinement depth-first search.
Note that the dispersed discretization rate of success exceeds that of the other discretization
techniques.
Looking over a number of dispersed discretizations, one quickly notices that more points are
repelled to the edge than in the uniform discretization. Although not a probable configuration,
any number of points placed at even intervals around the edge would be in equilibrium. With
repulsion parameters given above, it was typical to see 12 or more points along the edge of the
circle with five or fewer points dispersed internally. Empirically, extreme parameters
represented by the edge of the circular action region are more likely to appear in optimal
solutions. We hypothesize that having extra edge action choices aids in finding better
approximations to optimal solutions.
Furthermore, in this problem domain, searches of faster submarine trajectories (i.e. with
discretizations having more maximal velocities) will have lesser search depths to solutions if
such speedy solution trajectories exist. Since search depth affects search time complexity
exponentially, we likely benefit from a discretization with more maximal velocity values.
One key lesson in this and other experiments of [Neller, 2000] is that behaviors of greatest
interest often occur at extreme parameter values. Another key lesson is that an automated
discretization technique outperformed one hand-chosen by researchers (uniform). Not only can
such discretization techniques reduce the discretization burden of the programmer; they may
also yield superior discretizations.
Conclusions
Artificial Intelligence search algorithms search discrete systems. To apply such algorithms to
continuous systems, such systems must first be discretized, i.e. approximated as discrete
systems. Action-based discretization requires that both action parameters and action timing be
discretized.
The empirical study concerning sphere navigation provided insight into the importance of
searching with dynamic time discretization. Iterative-refinement algorithms are given an initial
time delay ∆t0 between search states and a solution cost upper bound. Such algorithms
iteratively search to this bound with successively smaller ∆t until a solution is found.
Iterative-refinement algorithms are important for search problems where reasonable values for
∆t are unknown or known and one wishes to find a solution more quickly and reliably. The key
tradeoff is that of knowledge. Lack of knowledge of a good time discretization is compensated
for by knowledge of a suitable solution cost upper bound. If one knows a suitable solution cost
upper bound for a problem where continuous time is relevant, an iterative-refinement algorithm
such as IR ∈-RBFS is recommended.
Bibliography
Neller, T. W. 2000. Simulation-Based Search for Hybrid System Control and Analysis, Ph.D.
Dissertation, Stanford University, Palo Alto, CA, USA. Also available as Stanford Knowledge
Systems Laboratory technical report KSL-00-15 at http://www.ksl.stanford.edu/.
Russell, S., and Norvig, P. 1995. Artificial Intelligence: A Modern Approach. Upper Saddle River,
NJ, USA: Prentice Hall.
By Eric Lengyel
Gamasutra
October 11,2002
URL: http://www.gamasutra.com/features/20021011/lengyel_01.htm
The idea of using the stencil buffer to generate shadows has been around for over a decade, but
only recently has 3D graphics hardware advanced to the point where using the stencil algorithm
on a large scale has become practical. Not long ago, there existed some unsolved problems
pertaining to stencil shadows that prevented the algorithm from working correctly under various
conditions. Advances have now been made, however, so that stencil shadows can be robustly
implemented to handle arbitrarily positioned point lights and infinite directional lights having
any desired spatial relationship with the camera. This article presents the intricacies of the
entire stencil shadow algorithm and covers every mathematical detail of its efficient
implementation.
Algorithm Overview
The basic concept of the stencil shadow algorithm is to use the stencil buffer as a masking
mechanism to prevent pixels in shadow from being drawn during the rendering pass for a
particular light source. This is accomplished by rendering an invisible shadow volume for each
shadow-casting object in a scene using stencil operations that leave nonzero values in the
stencil buffer wherever light is blocked. Once the stencil buffer has been filled with the
appropriate mask, a lighting pass only illuminates pixels where the value in the stencil buffer is
zero.
As shown in Figure 1, an object’s shadow volume encloses the region of space for which light is
blocked by the object. This volume is constructed by finding the edges in the object’s triangle
mesh representing the boundary between lit triangles and unlit triangles and extruding those
edges away from the light source. Such a collection of edges is called the object’s silhouette
with respect to the light source. The shadow volume is rendered into the stencil buffer using
operations that modify the stencil value at each pixel depending on whether the depth test
passes or fails. Of course, this requires that the depth buffer has already been initialized to the
correct values by a previous rendering pass. Thus, the scene is first rendered using a shader
that applies surface attributes that do not depend on any light source, such as ambient
illumination, emission, and environment mapping.
Figure 1. An object’s shadow volume encloses the region of
space for which light is blocked by the object.
The original stencil algorithm renders the shadow volume in two stages. In the first stage, the
front faces of the shadow volume (with respect to the camera) are rendered using a stencil
operation that increments the value in the stencil buffer whenever the depth test passes. In the
second stage, the back faces of the shadow volume are rendered using a stencil operation that
decrements the value in the stencil buffer whenever the depth test passes. As illustrated in
Figure 2, this technique leaves nonzero values in the stencil buffer wherever the shadow
volume intersects any surface in the scene, including the surface of the object casting the
shadow.
Figure 2. Numbers at the ends of rays emanating
from the camera position C represent the values left
in the stencil buffer for a variety of cases. The stencil
value is incremented when front faces of the shadow
volume pass the depth test, and the stencil value is
decremented when back faces of the shadow volume
pass the depth test. The stencil value does not
change when the depth test fails.
There are two major problems with the method just described. The first is that no matter what
finite distance we extrude an object’s silhouette away from a light source, it is still possible that
it is not far enough to cast a shadow on every object in the scene that should intersect the
shadow volume. The example shown in Figure 3 demonstrates how this problem arises when a
light source is very close to a shadow-casting object. Fortunately, this problem can be elegantly
solved by using a special projection matrix and extruding shadow volumes all the way to
infinity.
Figure 3. No matter
what finite distance
an object’s
silhouette is
extruded away
from a light source,
moving the light
close enough to the
object can result in
a shadow volume
that cannot reach
other objects in the
scene.
The second problem shows up when the camera lies inside the shadow volume or the shadow
volume is clipped by the near plane. Either of these occurrences can leave incorrect values in
the stencil buffer causing the wrong surfaces to be illuminated. The solution to this problem is
to add caps to the shadow volume geometry, making it a closed surface, and using different
stencil operations. The two caps added to the shadow volume are derived from the object’s
triangle mesh as follows. A front cap is constructed using the unmodified vertices of triangles
facing toward the light source. A back cap is constructed by projecting the vertices of triangles
facing away from the light source to infinity. For the resulting closed shadow volume, we render
back faces (with respect to the camera) using a stencil operation that increments the stencil
value whenever the depth test fails, and we render front faces using a stencil operation that
decrements the stencil value whenever the depth test fails. As shown in Figure 4, this technique
leaves nonzero values in the stencil buffer for any surface intersecting the shadow volume for
arbitrary camera positions. Rendering shadow volumes in this manner is more expensive than
using the original technique, but we can determine when it’s safe to use the less-costly depth-
pass method without having to worry about capping our shadow volumes.
Figure 3. No matter what finite distance an object’s
silhouette is extruded away from a light source, moving the
light close enough to the object can result in a shadow
volume that cannot reach other objects in the scene.
The details of everything just described are discussed throughout the remainder of this article.
In summary, the rendering algorithm for a single frame runs through the following steps.
A Clear the frame buffer and perform an ambient rendering pass. Render the visible
scene using any surface shading attribute that does not depend on any particular
light source.
B Choose a light source and determine what objects may cast shadows into the
visible region of the world. If this is not the first light to be rendered, clear the
stencil buffer.
C For each object, calculate the silhouette representing the boundary between
triangles facing toward the light source and triangles facing away from the light
source. Construct a shadow volume by extruding the silhouette away from the light
source.
D Render the shadow volume using specific stencil operations that leave nonzero
values in the stencil buffer where surfaces are in shadow.
E Perform a lighting pass using the stencil test to mask areas that are not
illuminated by the light source.
F Repeat steps B through E for every light source that may illuminate the visible
region of the world.
For a scene illuminated by n lights, this algorithm requires at least n+1 rendering passes. More
than n+1 passes may be necessary if surface shading calculations for a single light source
cannot be accomplished in a single pass. To efficiently render a large scene containing many
lights, one must be careful during each pass to render only objects that could potentially be
illuminated by a particular light source. An additional optimization using the scissor rectangle
can also save a significant amount of rasterization work -- this optimization is discussed in the
last section of this article.
To ensure that shadow volumes surround every last bit of space for which light is blocked by an
object, we must extrude the object’s silhouette to infinity. Using a standard perspective
projection matrix would cause such a shadow volume to be clipped by the far plane. To avoid
this unwanted effect, we can actually place the far plane at an infinite distance from the
camera.
Recall that the projection matrix transforms points from eye space to clip space. In OpenGL,
eye space is the coordinate system in which the camera lies at the origin, the x-axis points to
the right, the y-axis points upward, and the camera points down the negative z-axis. In clip
space, a 4D homogeneous point <x,y,z,w> is inside the view frustum if -w<x<w, -w<y<w, and -
w<z<w. Once primitives have been clipped, a vertex is transformed into a 3D point in
normalized device coordinates by performing a perspective divide by its w-coordinate. This
results in a point whose x, y, and z coordinates all lie in the range [-1,1]. In the final
transformation before rasterization, these coordinates are remapped to the dimensions of the
viewport and the physical range of the depth buffer.
(1)
where n is the distance to the near plane, f is the distance to the far plane, and l, r, b, and t
represent the left, right, bottom, and top edges of the rectangle carved out of the near plane by
the view frustum. By evaluating the limit as f tends to infinity, we obtain the matrix
(2)
Assuming w>0 (it is normally the case that w=1), the resulting z-coordinate of is always less
than the resulting w-coordinate of Vclip, ensuring that projected points are never clipped by the
far plane. A point at infinity is represented by 4D homogeneous vector having a w-coordinate of
zero in eye space. For such a point, (Vclip)z = (Vclip)w, and the perspective divide produces a 3D
point in normalized device coordinates having the maximal z-value of one.
In practice, the limitations of hardware precision can produce points having a normalized z-
coordinate slightly greater than one. This causes severe problems when the z-coordinate is
converted to an integer value to be used in the depth buffer because the stencil operations that
depend on the depth test to render shadow volumes may no longer function correctly. To
circumvent this undesirable effect, we can map the z-coordinate of a point at infinity to a value
slightly less than one in normalized device coordinates. The z-coordinate of a 3D point D in
normalized device coordinates is mapped from a value Dz in the range [-1,1] to a value D'z in
the range [-1,1-ε], where ε is a small positive constant, using the relation
(4)
We need to find a way to modify the z-coordinate of Vclip in order to perform this mapping as
points are transformed from eye space into clip space. We can rewrite Equation (4) as an
adjustment to (Vclip)z by replacing Dz with (Vclip)z / (Vclip)w and D'z with (V'clip)z / (Vclip)w as
follows.
(5)
(6)
We can incorporate this mapping into the projection matrix P∞ given by Equation (2) as follows
to arrive at the slightly tweaked matrix P'∞ that we actually use to render a scene.
(8)
If the graphics hardware supports depth clamping, then use of the matrix P'∞ given by Equation
(8) is not necessary. The GL_NV_depth_clamp extension to OpenGL allows a renderer to force
depth values in normalized device coordinates to saturate to the range [-1,1], thus curing the
precision problem at the infinite far plane. When depth clamping is enabled using the function
call
glEnable(GL_DEPTH_CLAMP_NV);
The question of depth buffer precision arises when using an infinite projection matrix. It is true
that placing the far plane at infinity reduces the number of discrete depth values that can occur
within any finite interval along the z-axis, but in most situations this effect is small. Consider
the function that uses the matrix P given in Equation (1) to map an eye-space point
V=<Vx,Vy,Vz,1> to its corresponding depth in normalized device coordinates:
(9)
We obtain a different dfunction d∞(V) by using the matrix P∞ given by Equation (2) to map an
eye-space point V to its normalized depth:
(10)
Given two eye-space points V1and V2, we can compare the differences in depth values
produced by the functions d and d∞ as follows.
(11)
This demonstrates that the standard projection matrix P maps the points V1and V2 to a range
that is a factor f /(f-n) larger than the range to which the points are mapped by the infinite
projection matrix , thus equating to greater precision. For practical values of f and n, where f is
much larger than one and n is much smaller than one, is close to unity, so the loss of precision
is not a significant disadvantage.
Silhouette Determination
The stencil shadow algorithm requires that the models in our world be closed triangle meshes.
In mathematical terms, the surface of any object that casts a shadow must be a two-
dimensional closed manifold. What this boils down to is that every edge in a mesh must be
shared by exactly two triangles, disallowing any holes that would let us see the interior of the
mesh.
The indexes i1, i2, and i3 are ordered such that the positions of the vertices Vi , Vi , and Vi to
1 2 3
which they refer are wound counterclockwise about the triangle’s normal vector. Suppose that
two triangles share an edge whose endpoints are the vertices Va and Vb as shown in Figure 5.
The consistent winding rule enforces the property that for one of the triangles, the index
referring to Va precedes the index referring to Vb, and that for the other triangle, the index
referring to Vb precedes the index referring to Va.
As demonstrated in Listing 1, the edges of a triangle mesh can be identified by making a single
pass through the triangle list. For any triangle having vertex indexes i1, i2, and i3, we create an
edge record for every instance in which i1< i2, i2< i3, or i3< i1and store the index of the current
triangle in the edge record. This procedure creates exactly one edge for every pair of triangles
that share two vertices Va and Vb, duplicating any edges that are shared by multiple pairs of
triangles.
Once we have identified all the edges, we make a second pass through the triangle list to find
the second triangle that shares each edge. This is done by locating triangles for which i1> i2,
i2> i3, or i3> i1 and matching it to an edge having the same vertex indexes that has not yet
been supplied with a second triangle index.
Armed with the edge list for a triangle mesh, we determine the silhou¹ette by first calculating
the dot product between the light position and the plane of each triangle. For a triangle whose
vertex indexes are i1, i2, and i3, the (unnormalized) outward-pointing normal direction N is
given by
(12)
since the vertices are assumed to be wound counterclockwise. The 4D plane vector K
corresponding to the triangle is then given by
(13)
Let L represent the 4D homogeneous position of the light source. For point light sources, LW≠0,
and for infinite directional light sources, LW=0. A triangle faces the light source if K•L>0;
otherwise, the triangle faces away from the light source. The silhouette is equal to the set of
edges shared by one triangle facing the light and one triangle facing away from the light.
Once the set of an object’s silhouette edges has been determined with respect to a light source,
we must extrude each edge away from the light’s position to form the object’s shadow volume.
In this section, we present methods that perform the extrusion by making use of widely
available vertex programming hardware exposed by the GL_NV_vertex_program and
GL_EXT_vertex_shader extensions to OpenGL.
For a point light source, the extrusion of the silhouette edges consists of a set of quads, each of
which has the two unmodified vertices belonging to an edge and two additional vertices
corresponding to the extrusion of the same edge to infinity. For an infinite directional light
source, all points project to the same point at infinity, so the extrusion of the silhouette edges
can be represented by a set of triangles that all share a common vertex. We distinguish
between points that should be treated normally and those that should be extruded to infinity by
using 4D homogeneous coordinates. A w-coordinate of one is assigned to the unmodified
vertices and a w-coordinate of zero is assigned to the extruded vertices. The extrusion methods
that we present utilize the information stored in the w-coordinate to perform the appropriate
vertex modifications.
Before we examine the extrusion methods, we must prepare the appropriate quad list or
triangle list (depending on whether we are using a point light or infinite directional light). We
need to make sure that the vertices of each extrusion primitive are wound so that the face’s
normal direction points out of the shadow volume. Suppose that a silhouette edge E has
endpoints A and B. The edge-finding code presented in Listing 1 associates the triangle for
which the vertices A and B occur in counterclockwise order as the first triangle sharing the edge
E. Thus, if the first triangle faces toward the light source, then we want the vertices A and B to
occur in the opposite order for the extruded primitive so that its vertices are wound
counterclockwise. If the first triangle faces away from the light source, then we use the vertices
A and B in the same order for the extruded primitive. Table 1 lists the vertices of the extrusion
of the edge E for point light sources and infinite directional light sources for the cases that the
first triangle associated with the edge E faces toward or away from the light source.
Table 1. Given a silhouette edge E having endpoints A and B, this table lists the object-space
vertices of the extruded shadow volume face corresponding to E. The first triangle associated
with the edge E is the triangle for which the vertices A and B occur in counterclockwise order.
Using the GL_NV_vertex_program extension, we can employ a couple simple vertex programs
to perform edge extrusion and transformation to clip space. In each program, we assume that
the product of the projection matrix and model-view matrix has been tracked into constant
registers c[0]–c[3] and that the object-space light position has been stored in constant register
c[4]. Vertex programs are enabled and these constants are loaded using the following function
calls, where lx, ly, lz, and lw represent the light position.
glEnable(GL_VERTEX_PROGRAM_NV);
glTrackMatrixNV(GL_VERTEX_PROGRAM_NV, 0,
GL_MODELVIEW_PROJECTION_NV, GL_IDENTITY_NV);
glProgramParameter4fNV(GL_VERTEX_PROGRAM_NV, 4, lx, ly, lz, lw);
For a point light source residing at the point L in object space, a vertex V from Table 1 is
unmodified if its w-coordinate is one and is extruded if its w-coordinate is zero by using the
formula
(14)
The following vertex program applies this formula and then transforms the resulting vertex
position into clip space.
!!VP1.0
ADD R0.xyz, v[OPOS], -c[4];
MAD R0, v[OPOS].w, c[4], R0;
DP4 o[HPOS].x, c[0], R0;
DP4 o[HPOS].y, c[1], R0;
DP4 o[HPOS].z, c[2], R0;
DP4 o[HPOS].w, c[3], R0;
END
In the case that shadow volume caps must be rendered (see the next section), a vertex
program nearly identical to the one above should be used to transform vertices belonging to
triangles that face away from the light source. Such vertices can be treated as if their w-
coordinates are zero, so the MAD instruction has no effect and can be removed when projecting
a back cap.
For an infinite light source residing at the point L (having w-coordinate zero) in object space, a
vertex V is unmodified or extruded by using the formula
(15)
The following vertex program applies this formula and then transforms the resulting vertex
position V' into clip space.
!!VP1.0
ADD R0, v[OPOS], c[4];
MAD R0, v[OPOS].w, R0, -c[4];
DP4 o[HPOS].x, c[0], R0;
DP4 o[HPOS].y, c[1], R0;
DP4 o[HPOS].z, c[2], R0;
DP4 o[HPOS].w, c[3], R0;
END
The formulas given by Equations (14) and (15) can also be implemented using the
GL_EXT_vertex_shader extension. Within our vertex shaders, we need to track model-view-
projection matrix and the current vertex position, and we need to define an invariant
corresponding to the object-space light position. Vertex shaders are enabled and these values
are initialized using the following code, where lpos points to the first component of the light
position.
glEnable(GL_VERTEX_SHADER_EXT);
GLuint mvp_matrix = glBindParameterEXT(GL_MVP_MATRIX_EXT);
GLuint vertex_pos = glBindParameterEXT(GL_CURRENT_VERTEX_EXT);
GLuint light_pos = glGenSymbolsEXT(GL_VECTOR_EXT,
GL_INVARIANT_EXT, GL_FULL_RANGE_EXT, 1);
glSetInvariantEXT(light_pos, GL_FLOAT, &lpos);
For a point light source, Equation (14) can be implemented using the following vertex shader
code, which also performs the transformation into clip space. We define a few temporary
variables to hold intermediate results.
glBeginVertexShaderEXT();
glEndVertexShaderEXT();
For an infinite light source, we can replace the operations performed in the point light case with
the following code to implement Equation (15).
As mentioned earlier, a completely closed shadow volume having a front cap and a back cap
must be rendered whenever the camera lies inside the shadow volume or the faces of the
silhouette extrusion could potentially be clipped by the near plane. We wish to render this more
expensive shadow volume as infrequently as possible, so a test for determining when it is not
necessary would be useful.
The near rectangle is the rectangle carved out of the near plane by the four side planes of the
view frustum. As shown in Figure 6, we can devise a test to determine whether the shadow
volume might be clipped by the near plane by constructing the set of planes that connect the
boundary of the near rectangle to the light source. We call the volume of space bounded by
these planes and by the near plane itself the near-clip volume. Only a point inside the near-clip
volume can have an extrusion away from the light source that intersects the near rectangle.
Thus, if an object is known to lie completely outside the near-clip volume, then we do not have
to render a capped shadow volume.
Figure 6. The near-clip volume is bounded by the planes
connecting the near rectangle to the light position L. If an
object lies completely outside the near-clip volume, then it’s
shadow volume cannot intersect the near rectangle, so it is
safe to render it without caps.
When constructing the near-clip volume, we consider three cases: 1) the light source lies in
front of the near plane, 2) the light source lies behind the near plane, and 3) the light source is
very close to lying in the near plane. Let W be the transformation matrix that maps eye space
to world space, and suppose that our light source lies at the 4D homogeneous point L in world
space. We consider a point light source (for which LW=1) to be lying in the near plane if its
distance to the near plane is at most some small positive value d. For an infinite directional light
source (for which LW=0), we consider the distance to the near plane to be the length of the
projection of the light’s normalized direction vector <Lx,Ly,Lz> onto the near plane’s normal
direction. In either case, we can obtain a signed distance d from the light source to the near
plane by calculating
(17)
If d>δ, then the light source lies in front of the near plane; if d<-δ, then the light source lies
behind the near plane; otherwise, the light source lies in the near plane.
In the case that the light source lies in the near plane, the near-clip volume is defined by the
planes K0=<0,0,-1,-n> and K1=<0,0,1,n>. These two planes are coincident, but have opposite
normal directions. This encloses a degenerate near-clip volume, so testing whether an object is
outside the volume amounts to determining whether the object intersects the near plane.
If the light source does not lie in the near plane, we need to calculate the vertices of the near
rectangle. In eye space, the points R0, R1, R2, and R3 at the four corners of the near rectangle
are given by
(18)
where n is the distance from the camera to the near plane, a is the aspect ratio of the viewport,
equal to its height divided by its width, and e is the camera’s focal length, related to the
horizontal field-of-view angle α by the equation e=1/tan (α/2). These four points are ordered
counterclockwise from the camera’s perspective. For a light source lying in front of the near
plane, the world-space normal directions Ni, where 0<i<3, are given by the cross products
(19)
where each R'i is the world-space vertex of the near rectangle given by R'i=WRi. For a light
source lying behind the near plane, the normal directions are simply the negation of those given
by Equation (19). The corresponding world-space planes Ki bounding the near-clip volume are
given by
(20)
We close the near-clip volume by adding a fifth plane that is coincident with the near plane and
has a normal pointing toward the light source. For a light source lying in front on the near
plane, the fifth plane K4 is given by
(21)
and for a light source lying behind the near plane, the fifth plane is given by the negation of this
vector. (Remember that if W is orthogonal, then (W-1)t =W.)
We determine whether a shadow-casting object lies completely outside the near-clip volume by
testing the object’s bounding volume against each of the planes Ki. If the bounding volume lies
completely on the negative side of any one plane, then the object’s shadow volume cannot
intersect the near rectangle. In the case that an object is bounded by a sphere having center C
and radius r, we do not need to render a capped shadow volume if Ki•C<-r for any i.
Figure 7 demonstrates that for point light sources, bounding volumes lying behind the light
source from the camera’s perspective may often be mistaken for those belonging to objects
that might cast shadows through the near rectangle. This happens when the bounding volume
lies outside the near-clip volume, but does not fall completely on the negative side of any one
plane. We can improve this situation substantially by adding an extra plane to the near-clip
volume for point lights. As shown in Figure 7, the extra plane contains the light position L and
has a normal direction that points toward the center of the near rectangle. The normal direction
N5 is given by
(22)
(23)
The plane K5 is added to the near-clip volume boundary for point light sources regardless of
whether the light position is in front of, behind, or in the near plane.
See “For Further Information” at the end of this article for methods that can be used to
determine whether other types of bounding volumes, such as ellipsoids, cylinders, and boxes,
intersect the near-clip volume.
Now that we can determine an object’s silhouette with respect to a light source, construct a
shadow volume by extruding the silhouette edges away from the light source, and decide
whether front and back caps are necessary, we are finally ready to render the shadow volume
into the stencil buffer. We assume that the frame buffer has already been cleared and that an
ambient rendering pass has been performed to initialize the depth buffer. This section
concentrates on the operations necessary to illuminate the scene using a single light source,
and these operations should be repeated for all light sources that can affect the visible region of
the world being rendered.
First, we must clear the stencil buffer, configure the stencil test so that it always passes, and
configure the depth test so that it passes only when fragment depth values are less than those
already in the depth buffer. This is done using the following function calls.
glClear(GL_STENCIL_BUFFER_BIT);
glEnable(GL_STENCIL_TEST);
glStencilFunc(GL_ALWAYS, 0, ~0);
glEnable(GL_DEPTH_TEST);
glDepthFunc(GL_LESS);
We are only going to be drawing into the stencil buffer, so we need to disable writes to the color
buffer and depth buffer as follows.
Shadow volume faces are rendered using different stencil operations depending on whether
they face toward or away from the camera, so we need to enable face culling with the following
function call.
glEnable(GL_CULL_FACE);
For a shadow volume that does not require capping because it cannot possibly intersect the
near rectangle, we modify the values in the stencil buffer when the depth test passes. The
stencil value is incremented for fragments belonging to front-facing polygons and is
decremented for fragments belonging to back-facing polygons. These operations are performed
by the following function calls, where the function DrawShadowVolume() renders all of the
polygons belonging to the shadow volume.
glCullFace(GL_BACK);
glStencilOp(GL_KEEP, GL_KEEP, GL_INCR);
DrawShadowVolume();
glCullFace(GL_FRONT);
glStencilOp(GL_KEEP, GL_KEEP, GL_DECR);
DrawShadowVolume();
If a shadow volume does require capping, then we modify the values in the stencil buffer when
the depth test fails. The stencil value is incremented for fragments belonging to back-facing
polygons and is decremented for fragments belonging to front-facing polygons (the opposite of
the depth-pass operations). These operations are accomplished using the following function
calls. In this case, the DrawShadowVolume() function renders the polygons belonging to the
shadow volume’s caps as well as its extruded silhouette edges.
glCullFace(GL_FRONT);
glStencilOp(GL_KEEP, GL_INCR, GL_KEEP);
DrawShadowVolume();
glCullFace(GL_BACK);
glStencilOp(GL_KEEP, GL_DECR, GL_KEEP);
DrawShadowVolume();
Once shadow volumes have been rendered for all objects that could potentially cast shadows
into the visible region of the world, we perform a lighting pass that illuminates surfaces
wherever the stencil value remains zero. We re-enable writes to the color buffer, change the
depth test to pass only when fragment depth values are equal to those in the depth buffer, and
configure the stencil test to pass only when the value in the stencil buffer is zero using the
following function calls.
Since the lighting pass adds to the ambient illumination already present in the color buffer, we
need to configure the blending equation as follows.
glEnable(GL_BLEND);
glBlendFunc(GL_ONE, GL_ONE);
We also need to make the function call glCullFace(GL_BACK) just in case a depth-pass shadow
volume was most recently rendered, leaving the culling state set to GL_FRONT. After the lighting
pass has been rendered, we clean up by resetting a few rendering states back to those needed
by the ambient pass for the next frame using the following function calls.
glDepthMask(GL_TRUE);
glDepthFunc(GL_LEQUAL);
glStencilFunc(GL_ALWAYS, 0, ~0);
Because we needed to perform different stencil operations for front-facing polygons and back-
facing polygons in our shadow volumes, we had to render the shadow volumes twice. Of course,
the graphics hardware culled each polygon on either the first pass or the second, but the
vertices still had to be processed two times. The GL_EXT_stencil_two_side extension to
OpenGL provides a way to avoid this suboptimal situation by allowing separate stencil state for
front faces and back faces to be specified simultaneously. When using this extension, we render
both front faces and back faces of the shadow volume at the same time, so face culling should
be disabled. We therefore prepare to render shadow volumes by making the following function
calls.
glEnable(GL_STENCIL_TWO_SIDE_EXT);
glDisable(GL_CULL_FACE);
glActiveStencilFaceEXT(GL_FRONT);
glStencilOp(GL_KEEP, GL_KEEP, GL_INCR_WRAP_EXT);
glActiveStencilFaceEXT(GL_BACK);
glStencilOp(GL_KEEP, GL_KEEP, GL_DECR_WRAP_EXT);
DrawShadowVolume();
A capped shadow volume is rendered using the depth-fail stencil operations shown in the code
below.
glActiveStencilFaceEXT(GL_FRONT);
glStencilOp(GL_KEEP, GL_DECR_WRAP_EXT, GL_KEEP);
glActiveStencilFaceEXT(GL_BACK);
glStencilOp(GL_KEEP, GL_INCR_WRAP_EXT, GL_KEEP);
DrawShadowVolume();
Note the use of the GL_INCR_WRAP_EXT and GL_DECR_WRAP_EXT stencil operations. These are
provided by the GL_EXT_stencil_wrap extension to OpenGL and allow stencil values to wrap
when they exceed the minimum and maximum stencil values instead of being clamped. These
operations are necessary because we do not know in what order the polygons belonging to the
shadow volume will be rendered and we must account for the possibility that the stencil value
for a particular pixel could be decremented before it is incremented.
Scissor Optimization
When using an attenuated light source, it is usually convenient to define a range r beyond
which the light source does not contribute any illumination to the world. Although this is not a
physically correct model, using an attenuation function that vanishes at a distance r from the
light’s position allows us to quickly cull any light source whose sphere of illumination does not
intersect the view frustum. When a light source’s sphere of illumination is visible, the area
within the viewport that could possibility be affected by the light source may not be the entire
viewport. By projecting the sphere of illumination to the image plane and using the scissor
rectangle to limit our drawing to the projected area of influence, we can avoid a significant
amount of superfluous rendering of both shadow volumes and illuminated surfaces.
Suppose that we have a point light source whose center lies at the point L in eye space and
whose range is r, as shown in Figure 8. We wish to find four planes, two parallel to the x-axis
and two parallel to the y-axis, that pass through the camera position (the origin in eye space)
and are also tangent to the light source’s bounding sphere. Once these planes have been
determined, we can locate their intersections with the image plane to find the rectangular
boundary of the projection of the light source’s bounding sphere.
(24)
(25)
By expanding the dot product and rearranging slightly, we can rewrite Equation (24) as
(26)
Squaring both sides of Equation (26) and making the substitution N2z=1-N2x, we have
(27)
(28)
(29)
D<0 precisely when L2x+L2z<r2 (i.e., when the origin falls within the projection of the sphere
onto the x-z plane). When this happens, we know the light source’s bounding sphere fills the
entire viewport and we do not continue.
If D>0, then we can solve equation (28) using the quadratic formula to obtain
(30)
This gives us two values for Nx. The corresponding values for Nz are calculated by making a
small adjustment to Equation (26):
(31)
We only want to consider planes whose point of tangency with the light source’s bounding
sphere lies in front of the camera. As illustrated in Figure 8, the point of tangency P lies in the
plane <Nx,0,Nz,0> at a distance r from the point L, giving us the following two equations.
(32)
(33)
(34)
(35)
For the point P to lie in front of the camera, we must require that Pz<0. Since the tangent plane
is parallel to the y-axis, the values of Py and Ly are equal and the quantity L2y cancels in
Equation (35). By solving Equation (32) for Px, we can make the substitution
(36)
in Equation (35) to arrive at the following equation written completely in terms of the unknown
Pz.
(37)
(38)
For any tangent plane <Nx,0,Nz,0> calculated using Equations (30) and (31), we calculate the
corresponding value of Pz using Equation (38). If Pz < 0, then we have found a plane that may
allow us to shrink the scissor rectangle. We now need to determine where the tangent plane
intersects the image plane.
As shown in Figure 8, the image plane is perpendicular to the z-axis and lies at a distance e
from the camera. On the image plane, the area of the viewport corresponds to x-coordinates in
the range [-1,1] and y-coordinates in the range [-a,a], where a is the aspect ratio given by the
height of the viewport divided by its width. Any point Q lying in the image plane has
coordinates <x,y,-e>. A point Q lying in the plane tangent to the light source’s bounding sphere
satisfies N · Q = 0, so we can solve for x:
(39)
This x-coordinate can be mapped to the viewport coordinate x' using the formula
(40)
where l is the left edge of the viewport and w is the viewport’s width, both in pixels.
Given a value x' calculated using Equation (40), we need to determine whether it represents a
left-side boundary or a right-side boundary. This can be accomplished by plugging the value Pz
given by Equation (38) into Equation (36) to obtain Px. If , then represents a left-side boundary
because the point of tangency falls to the left of the light source. If Px>Lx, then x' represents a
right-side boundary. Since the value may lie outside the viewport (if x ∉ [-1,1]), we calculate
the left and right edges of the scissor rectangle as follows.
(41)
The two tangent planes parallel to the x-axis are found in an almost identical manner. Each of
these planes is represented by a 4D vector <0,NyNz,0>, whose nonzero components are given
by the following formulas.
(42)
(43)
If Pz<0, then the y-coordinate where each plane intersects the image plane is given by
(44)
where the viewport’s aspect ratio a has been added to the denominator. Finally, the viewport
coordinate y' is calculated using the formula
(45)
where b is the bottom edge of the viewport and h is the viewport’s height, both in pixels.
(46)
If Py<Ly, then y' represents a bottom-side boundary. If Py>Ly, then y' represents a top-side
boundary. As with the left and right sides, the values of y' should be clamped to the viewport’s
range as follows.
(47)
Using the values given by Equations (41) and (47), the OpenGL scissor rectangle is enabled and
set to the appropriate values using the following function calls.
glEnable(GL_SCISSOR_TEST);
glScissor(scissor.left, scissor.bottom,
scissor.right - scissor.left,
scissor.top - scissor.bottom);
The scissor rectangle affects the clear operation as well, so once rendering has been completed,
one should either disable the scissor test or set the scissor rectangle back to the entire viewport
rectangle by making the call glScissor(l, b, w, h).
Conclusion
The techniques described in this article can be used to efficiently render the shadow volumes
needed to display a fully shadowed scene in real-time using stencil operations. Future graphics
hardware will undoubtedly incorporate greater shadow volume functionality that will relieve the
CPU from some of the work that it currently has to do, but the ultimate determination of speed
will be innovative methods for minimizing the number of shadow volumes that must be
rendered in the first place. Achieving high frame rates for complex scenes having multiple light
sources is now the goal of larger-scale optimizations, and this is currently a hot area of 3D
graphics research.
The following is the original paper discussing shadow volume capping and depth-fail stencil
operations
Everitt, Cass and Kilgard, Mark J., “Practical and Robust Stenciled Shadow Volumes
for Hardware-Accelerated Rendering”, NVIDIA Corporation, 2002.
http://developer.nvidia.com/view.asp?IO=robust_shadow_volumes
Mathematical derivations of different bounding volume tests that can be used to determine
whether an object’s shadow volume might intersect the near rectangle can found in the
Visibility Determination chapter of the following book.
Information about the OpenGL extensions used in this article can be found at the OpenGL
Extension Registry website:
http://oss.sgi.com/projects/ogl-sample/registry/
By Rick Lambright
Gamasutra
September 16, 2002
URL: http://www.gamasutra.com/resource_guide/20020916/lambright_01.htm
A s game developers, we are continuously challenged to create richer and richer game worlds.
Whether we are developing a 16-player multiplayer game, or a 10,000-player persistent world,
making richer game worlds efficiently means we must be increasingly intelligent about how we
distribute the everchanging state of our game objects. This problem is further complicated by
the diversity of the network connection characteristics of each player. In this article, I’ll describe
a technique for managing the distribution of object state using an encapsulation mechanism
called an object view. Object views provide a means for managing the distribution of object
state on a perobject basis that is flexible and transparent to the game object. In order to
describe what they are and how they are used, we’ll also peer into the workings of a distributed
object system designed for multiplayer games.
A s with many other areas of computing, some of the most significant problems inherent in
distributing simulations have to do with resource management. In the case of networking, our
primary concerns are with the limitations of the game clients and especially the nasty problem
of controlling bandwidth utilization.
For most subscriptionbased massively multiplayer (MMP) games, bandwidth limitations are not
based upon physical limits; rather they are based upon band width costs. This means that
proper bandwidth management translates into real dollars in a very big and measurable way.
Other techniques, such as those for masking lag and smoothing movement, are also essential
for creating great multiplayer games. But for these to be effective, accountability must be had
in the underlying implementation for the bandwidth limitation, whether constrained artificially or
by the physical medium itself. After all, the bits have to actually arrive at their destination
before they can do any good. Proper bandwidth management isn’t just a networking problem,
it’s a wholegame problem.
But what does all this accountability have to do with object views? Before we get into the nuts
and bolts of object views, let’s talk a little about why we need them.
At Monolith, we have been using object views as a fundamental construct in the development of
our distributed object system. A Distributed-object system is a game system that manages the
housekeeping chores related to the distribution of object state. It is the principal user of the
relevantset creation mechanisms, which in our implementation are provided by the world
representation (see Figure 1). Relevant sets are collections of objects whose state changes
need to be distributed immediately (if not sooner) if we are to ensure that a remote client’s
view of the simulation matches the actual state of the simulation. The topic of relevantset
generation is so large that it warrants its own separate discussion, so I won’t be delving into it
very much here.
Direct data management vs.RPC. Distributed-object system implementations for both games
and distributed simulations typically manage distribution of object state data rather than simply
providing a generalpurpose remote procedure call (RPC)based mechanism. Why? The answer is
rooted not only in our fundamental need to make the best possible use of the available
bandwidth, but also, as we will see later, in our need to design a system that makes it as
simple as possible for us to specify exactly how we want the component parts of our game
objects to be distributed.
As simple as the system seems conceptually, the devil really is in the details. Even if we are
strictly using visibility based relevance determination, the full relevant set for a given client at
any instant can be enormous. As an example, consider what happens when you direct your
playercharacter to stroll up to the top of a nearby hill. As you crest the hill, the number of
visible objects is likely to increase dramatically. Unfortunately, the amount of available
bandwidth remains somewhat constant over time, so the distribution of objects in the relevant
set must be managed carefully, using prioritization techniques that allow the most important
state to be sent immediately and the less important state to be transmitted as soon as possible
thereafter.
Figure 2 shows the basic component parts of a simple game object that you might find in a
generic multiplayer game. The object consists of three major groups of component items:
z Visual and displayrelated items. These are component items related to the visual state
of the game object, including movement and position information. They very much need
to be distributed. For playercharacter objects, this includes values that may only be
displayed on a HUD (headsup display) of the player controlling that character.
z Game logic and AIrelated items.These are component items related to the game state of
the object. In a purely serverbased simulation, these items would seldom (if ever) be
distributed to clients,but could be distributed to a trusted entity, such as another server.
z Housekeeping items.These are component items, such as reference counts and pointers
to internal structures. They are not distributed.
As our playercharacter roves around within the simulation, it will encounter new game objects,
spend a little time hanging around near them, leave the area, and very likely reencounter many
of the same game objects sometime later on. Since we only want to be sent updates for the
items that have changed since the last time we encountered the object, something will have to
remember the state that the object was in the last time we encountered it. To complicate
matters, one client may have very different distribution requirements from another client for
the same object. This is where object views come in.
Object Views
An object view is an instance of a custom class that knows how to access one or more
components of a game object and track any changes to those components. Every object view is
attached to a game object, and every object view also has a remote counterpart that is
attached to a game object with a similar set of components. As changes occur to the states of
the tracked components, the object view is responsible for communicating those changes to its
remote counterpart. The counterpart is then responsible for applying those changes to the
game object to which it is attached.
The distributed-object system itself is designed to interact with object views, not game objects.
How the object view interacts with each game object is strictly a contract between the object
view and the game object. The distributed-object system only distributes object views. To
access the gameobject components (given a reference to a game object) efficiently at run time,
each object view instance is created with full knowledge of which components of the game
object it needs to track and how to access them. Hence, implicit in the nature of the object view
is the notion of a binding to the gameobject components that the object view will track.
The abstraction from the game object that the object view provides to the distributed object
system is one of its most significant benefits. An object view and its counterpart can each be
bound to a different type of object and still communicate with each other for managing state
distribution. This eliminates the requirement to use identical objects on both the client and the
server. For us, this was an important design consideration, since our clientside objects differ
significantly from their server side counterparts.
Object view operations. Figure 3 shows how object views interact with game objects and the
distributed object system at a high level. Note that there is a onetomany relationship of object
views to game objects on the server, and a onetoone relationship on the client. In client/server
architectures, servers maintain connections to many clients, but the client typically has only one
connection to a server. The object view functions as a local proxy that remembers the state of
each game object’s distributed components from the last time it was distributed to a particular
client. Since state distribution will only occur when game objects are relevant to a client, the
state of each object view is potentially unique.
When an object enters the relevant set for a client, the Distributed-object system first locates
the clientspecific object views for that game object, creating a new one if one does not already
exist. Newly created object views on the server represent objects that will need to be created
and fully initialized on the client before they can be rendered.
Either way, the process of determining exactly what state updates are needed and how the
determination is made is strictly a contract between the object view and the game object. In
order to ensure that the object view is granted the flexibility it needs, the Distributed-object
system requires every object view to provide two basic operations: packto and unpackfrom.
The packto operation is called when the object view needs to be provided an opportunity to
distribute its state. The object view determines whether or not any state updates are required,
and is then responsible for marshaling those updates directly into the transmission buffer,
packing them as tightly as possible in the process. Only the sending object view and its
receiving counterpart on the other end of the connection can be trusted to understand the
format of this data. The object view’s unpackfrom operation is called up when state updates are
received. This is typically a simple process of analyzing the received data and applying the
updates to the appropriate components of the target game object. This also turns out to be a
great time for an object view to provide event notifications to the game object — or to
anywhere else in the game — whenever one or more specific components are updated.
A third basic operation that each object view should provide is solid diagnostics. Object view
operations are deliberately mysterious to the rest of the system components, and only the
object views themselves may understand the format of the data they utilize to communicate
state updates. Because of this, marshaling errors will have downstream effects that can be
difficult to debug without good diagnostics.
Tracking state changes.When it comes time to distribute the state of the game object, each
object view will need to determine whether the components it is tracking have changed since
the last time the packto operation was called. This requires the object view to remember
something about the previous state of those components. There are a variety of techniques that
the object view can utilize to track state changes; invasive techniques require special support
from the game objects, whereas game objects operate obliviously to noninvasive techniques.
The determination of which tracked components have changed state will normally take place
during the packto operation, and while the game object remains relevant for a client, the packto
operation for its views will be called frequently. For this reason, the packto operation must be
very efficient.
The most straightforward technique is for the object view to maintain its own copy of the game
object components that it is tracking. If sufficient memory is available and the tracked items
can be compared very efficiently, this noninvasive mechanism is hard to beat. Since the exact
previous state of each variable is always available, the object view can be certain that it is only
distributing state that differs on the target.
Adding a change counter to the game object is an evasive technique we have found particularly
useful. We use this for complex objects that are tested frequently but whose state changes
relatively infrequently. Each object view also has a change counter, and each time the state is
distributed the view’s counter is set to the current value of the game object’s counter. By
comparing the two counters, a very fast check can be made to see if any new changes have
occurred. This technique could be used as an optimization for any object that is tracking more
than a few items, but it does require that each game object be modified to ensure that its
change counter is updated every time any of the tracked components are updated. Another
invasive technique that we have seen utilized involves maintaining a bit set of change flags.
This technique requires that the game object be designed to manage a bit set that is stored
with the game object itself. Each bit in the set corresponds to a distributed component part. The
object view keeps its own copy of the bit set and checks to see if its own copy matches that of
the game object during the packto operation, in order to determine which component parts
have changed.
Unfortunately, this technique suffers from three drawbacks. First, you must ensure that the
corresponding bit is set every time a distributed component variable is updated. Second, if a
component switches back and forth between a small set of states, then there is a significant
chance that a value marked as changed would be sent to the target object even though it
actually switched back to being in the same state as the target. This process wastes bandwidth.
The third drawback is the most serious. The “changed” component bits on each object need to
be cleared as soon as possible for optimal distribution, but they can only be safely cleared when
state has been distributed to all clients.
Because of that fact, this technique is really only practical for smallscale simulations where all
clients need to be kept continuously up to date with the current state of all game objects.
This includes all object types and primitive types. Once this is done, complex objects can be
managed by creating hierarchical object views that mirror the component hierarchy of the
object.
Lifespan of an object view. Over the course of time, a player will potentially encounter tens
of thousands of objects in a large simulation. A server would need to maintain all the object
views permanently for every game object if it wanted to avoid the expense of recreating them.
This is memoryintensive not only for servers, but also potentially for clients as well.
Fortunately, this problem can be handled fairly effectively using an active cache of object views.
Old object views are then automatically purged from the cache over time if the game objects
they track are not reencountered for extended periods.
At the instant an object view needs to be created, a perfect opportunity exists to make some
intelligent decisions. By checking the connection characteristics of the client, the distributed
object system can select an object view that is tailored for supporting specific clients. This also
means that clients with unique communications requirements could conceivably coexist in the
same game environment, sharing the game objects with clients that have completely different
communications requirements. This could, for example, allow a client on a handheld device to
share the game world with clients connected via a PC or game console.
Object views are also natural places to handle prediction. In addition, managing movement
prediction in the object view makes it possible to utilize different predictive contracts for clients
with differing connection characteristics. For example, you could utilize a prediction technique
for a client on a modem connection that was completely different from one with a broadband
connection simply by selecting the appropriate type of object view when one needs to be
created.
Name that tuning. Previously I mentioned that one of the reasons that we want our Distributed-
object system to manage distribution of our state data was because of our need to design a
system that would let us easily specify how we want our game objects to be distributed.
Applying distribution attributes to the data is necessary if we are to help tune how object state
is distributed at run time. Tuning is a critical responsibility that is shared between the
relevantset mechanism and the Distributed-object system.
To try to maintain a steady flow of traffic through the network, a measured allotment of
bandwidth is calculated for each cycle. If a cycle exceeds its allotment, that affects the
bandwidth allotment for the next cycle. When allotments are exceeded, the relevantset
mechanism must trim the set of objects to those that it determines are the most urgent to
distribute. If the relevantset mechanism undercompensates (that is, provides an excess of
objects to distribute) for the available bandwidth on that cycle, the tuning support mechanisms
of the Distributed-object system and object views come into play. This also holds true when
bandwidth is being underutilized. In this way, the two systems work together continuously to
make optimal use of bandwidth.
The ability to tune how an object’s state is distributed at run time is very important. By
providing some specific information about how we want each game object to be distributed, we
should be able to tune the system for optimal distribution. Here are some useful attributes that
an object view can use for tuning how individual components, or groups of components, are
distributed:
z Priority. A distribution priority can be set for each component item to designate which
items are more important to distribute. When an object view is faced with needing to
reduce the amount of bandwidth being consumed, it can select from the highestpriority
items. As long as the object remains in the relevant set, lowerpriority items will
eventually be distributed during later cycles.
z Reliability.The ability to specify whether or not a component item’s state should be
distributed reliably (guaranteed) or if it can be distributed unreliably (not guaranteed), is
a significant tuning option. When eligible for distribution, unreliable items will only need to
be sent once. Delivery of the state update is never confirmed, so item state will not be
present in case of a delivery failure. This attribute can have a great impact on over all
bandwidth utilization in times of significant packet loss, but it must be used carefully. It is
typically used for items that change very frequently and when a missed update has
minimal impact. An object view could also choose to set the reliability attribute
conditionally at run time.
z Group. Some component items will need to be distributed as a unit with others. The
group attribute specifies that on a given cycle, unless all the member items of the group
can be distributed, none should be distributed.
z Direction. For object views that support bidirectional state updates, the direction
attribute can ensure that an object view only works in one direction. For example, the
object view for a player object might need to be bidirectional when it connects to the
client represented by that object, but unidirectional when distributing state belonging to a
“foreign” player representing a different client. This can also be a security consideration
on a server, preventing hacked clients from using bidirectional object views illegitimately.
z Initialization only. Components such as object IDs that will not change during the
lifetime of the object can be tagged with the initializationonly attribute. After the initial
distribution, these items will not need to be tracked by the object view, resulting in
greater processing efficiency. You should also provide a declarative means of assigning
attributes to the distributed components of the game object.
Ideally, this is part of the definition of the game object itself. A custom scripting language
capable of defining game objects can build distributionattribute assignments directly into the
language itself. UnrealScript, for example, provides a replication statement, where
deliveryrelated attributes can be specified for individual items of the class. In our own
implementation, these attributes are assigned when the game object is defined using an
internal compilation tool that generates source code for both the game object and its object
views. Where dragons dwell.Until a reliable transport protocol with predictable delivery is
available over the Internet, simulations with timecritical delivery requirements will continue to
use unreliable protocols, such as User Datagram Protocol (UDP).
Many complications can arise when object state is distributed using unreliable communications.
Ideally, we want to use our limited bandwidth for transmitting only the most recent state of our
game objects. Retransmission, due to packet loss, of old packets containing old state is a very
poor way to solve the problem. Here too, object views have proved to be a very useful tool. In
addition to their componenttracking responsibilities, they can also keep track of the success —
or failure — of the delivery of state information to their remote counterparts. How do they do
this? I’ll leave the answer as an exercise for a rainy day.
In this article, I’ve discussed how object views can be utilized as part of a distributed-object
system to help encapsulate management of the distribution of object state. We also looked at
how they can be used in the implementation of a Distributed-object system. At Monolith, we
have found object views to be a very valuable tool in the implementation of our own
Distributed-object system. Object views have provided us with an extraordinary amount of
flexibility, allowing us to create simpleyet elegant solutions to a variety of the problems we
needed to solve.
By Matt McLaurin
Gamasutra
August 16, 2002
URL: http://www.gamasutra.com/features/20020816/maclaurin_01
Licensing rendering engines is now a well-established practice, with great potential cost and
time savings over the development of a single game. As game developers reach for new forms
of gameplay and a better process for implementing established genres, the wisdom of licensing
physics engines is becoming inescapable. Commercial engines such as Havok and Mathengine's
Karma (at press time, Criterion Software, makers of the Renderware line of development tools,
were in negotiations to acquire Mathengine) have become mature platforms that can save
months in development and test. Their robust implementations can provide critical stability
from day one, and their advanced features can offer time advantages when developers are
exploring new types of gameplay.
This sophistication does come with a cost. Physics engines do more than just knock over boxes,
and the interface between your game and a physics engine must be fairly complex in order to
harness advanced functionality. Whether you have already licensed an engine and want to
maximize your investment or you're just budgeting your next title, gaining a better
understanding of the integration process will save a lot of trial and error, and hopefully let you
focus on better physics functionality while spending less time watching your avatar sink through
the sidewalk.
The bare minimum we expect from a physics engine is fairly obvious: we want to detect when
two objects are interacting and we want that interaction to be resolved in a physically realistic
way - simple, right? As you progress deep into integration, however, you'll find physics affects
your user interface, logic mechanisms, AI routines, player control, and possibly even your
rendering pipeline (Figure 1).
Here at Cyan Worlds, we're more than a year into our use of a commercial physics engine,
having integrated it with our own proprietary game engine. I'm going to share with you some of
the nuts and bolts of our integration process. In the first part of this article, I'll talk about the
fundamentals: data export, time management, spatial queries, and application of forces. Then,
with an eye toward character-centric game implementations, I'll visit the twin demons of
keyframed motion and player control. In these areas, challenges arise because both of them
require that you bend the laws of physics somewhat, and that means you must draw some clear
distinctions between what is physics and what is programming for effect.
There are three categories of geometry supported by physics engines. The simplest are
primitives, represented by formulae such as sphere, plane, cylinder, cube, and capsule. Some-
what more expensive is convex polygonal geometry. Convexity simplifies detection and
response greatly, leading to improved performance and better stability. Convex shapes are
useful for objects where you need the tighter fit that you can get from a primitive but don't
have to have concavity. Finally, there is polygonal geometry of arbitrary complexity, also known
as polygon soups. Soups are fairly critical for level geometry such as caves and canyons but are
notoriously difficult to implement robustly and must be handled with care to avoid slowdowns.
Since these geometric types have different run-time performance costs, you'll want to make
sure that your tools allow artists to choose the cheapest type of physical representation for their
artwork. In some cases your engine can automatically build a minimally sized primitive (an
implicit proxy) at the artist's request; in other cases the artists must hand-build substitute
geometry (an explicit proxy). You'll need to provide a way to link the proxy to the visible
geometry it represents, so that changes in the physical state of an object will be visible to the
user.
Transforms
In most cases, a proxy and its visible representation will have the same transform; you want all
movement generated from physics to be mirrored exactly in the rendered view. To relieve
artists from having to align the transforms manually - and keep error out of your process - you
may find it worthwhile to move the vertices from the proxy into the coordinate space of the
visible geometry (Figure 2a).
However, if the proxy geometry will be used by several different visible geometries, you may
wish to keep the vertices in their original coordinate system and simply swap in the visible
geometry's transform (Figure 2b). This method will let you use physical instances, wherein the
same physical body appears several different places in the scene. This latter approach, while
enabling efficiency via instancing, can be less intuitive to work with because the final position of
the physical geometry depends on the transforms of objects it's used for and not the position in
which it was actually modeled.
Time Management
Dealing with time cleanly is an extremely important thing to get right early on in integrating a
physics engine. There are three key aspects of time relevant to simulation management: game
time, frame time, and simulation time.
Game time is a real-time clock working in seconds. While you might be able to fudge your way
from a frame-based clock to a pseudo-real-time clock, working with seconds from the start will
give you a strong common language for communicating with the physics subsystems. The more
detailed your interactions between game logic, animation, and physics, the more important
temporal consistency becomes - a difference of a few hundredths of a second can mean the
difference between robust quality and flaky physics. There will be situations where you want,
for example, to query your animation system at a higher resolution than your frame rate. I'll
talk about this kind of situation later in the "Integrating Keyframed Motion" section.
Frame time is the moment captured in the rendered frame. Picture it as a strobe light going off
at 30 frames per second. While you only get an actual image at the frame time, lots is
happening between the images.
Simulation time is the current time in your physics engine. Each frame, you'll step simulation
time until it reaches the current target frame time (Figure 3). Choosing when in your loop to
advance simulation can greatly affect rendering parallelism.
Rendering frame rates can vary; if your physics step size varies, however, you'll see different
physical results - objects may miss collisions at some rates and not at others. It's also often
necessary to increment, or step, the simulation at a higher rate than your display; physics will
manage fast-moving objects and complex interactions more accurately with small step sizes.
Tuning your physics resolution is straightforward. At physics update time, simply divide your
elapsed time by your target physics frequency and step the physics engine that many times.
Careful though, if your frame rate drops, this approach will take more physics steps so that
each step interval is the same size, which will in turn increase your per-frame CPU load. In
situations of severe lag, this can steal time from your render cycle, lowering your frame rate,
which then causes even more physics steps, ad infinitum.
In such scenarios, you need a way to drop your physics-processing load until your pipeline can
recover. If you're close to your target frame rate, you may be able to get away with taking
larger substeps, effectively decreasing your physics resolution and accepting a reduction in
realism. If the shortfall is huge, you can skip updating the simulation altogether - simply freeze
all objects, bring the simulation time up to the current frame time, and then unfreeze the
objects. This process will prevent the degeneracies associated with low physics resolution, but
you'll have to make sure that systems that interact with physics - such as animation - are
similarly suspended for this time segment.
If you're receiving events from the physics engine, the difference in clock resolution between
graphics and physics has another implication: for each rendering frame, you'll get several
copies, for example, of the same contact event. Since it's unlikely that recipients of these
messages - such as scripting logic - are working at physics resolution, you'll need to filter out
these redundant messages.
Applying Forces
There are three ways to give an object motion in a physics world: you can apply a force to the
object, you can apply an impulse, and you can set its velocity directly. Each has different trade-
offs.
To be effective, a force has to be applied over a specific amount of time. In many sims,
applying a force means "apply this force over the next simulation step." This is usually not what
you want, as applying a force for 1/60th of a second won't push it very far unless it's a huge
force. What you do want is a way to say, as simply as possible, "apply this amount of force for
this amount of time." There are three ways to do this.
The first approach is to continually reapply the force each substep until you've reached your
target time. For each force you wish to apply, keep track of how long it needs to be applied,
and apply it one substep at a time. The problem with this approach is its complexity; you need
to keep track of each force that you're applying, how long it's been applied for, and how much
longer it's going to be applied. There's also the minor problem that you must apply forces over
an integer number of substeps, which limits how finely you can tune your use of forces.
The second approach is to use impulses. An impulse is a force premultiplied by a time and
which takes effect instantaneously. If you want to apply a force of 10 newtons continuously
over 1/10th of a second, a 1-newton impulse will do the trick. The limitation to using impulses
is that the force is not in fact applied for the entire time; all the energy is delivered instantly,
and your object reaches its target velocity instantaneously rather that being gradually
accelerated. For quick forces, such as a jump or a bullet, the simplicity of impulses makes them
preferable to actual forces. If you want to lift something slowly, though, forces are the way to
go.
The third approach -- velocities -- is both limiting and particularly useful for situations where
you need very tight control. We'll discuss it in detail later in the "Player Control Strategies"
section.
Spatial Queries
Physics engines by their nature incorporate high-performance spatial data structures. These are
handy for a lot of query types:
z Trigger volumes (switch to camera B when the user enters this region).
z Line-of-sight (can I see the power tower from here?).
z Ray casts for AI environment probing (can Watson see me?).
z Proximity queries for AI (start talking when the player is within five feet).
z Evaluating theoretical object placement (can this door close without crushing
anything?).
z Ray casts for picking (let the user click on the lever).
z Volume queries for motion planning (can I walk all the way to the hatch?).
Spatial queries can affect many types of game logic. A good query interface will save you time
every day; it's an area of integration that will reward careful planning. While it can be very
game specific, there are a few design parameters for your query interface that apply to almost
all games:
Cascading. One query can significantly narrow the field for multiple, more complex queries: a
20-foot sphere around your avatar can gather all potentially interesting objects for subsequent
query by line-of-sight.
Triggers. Some queries are set up once and report only when their state changes. For
example, a region might notify you when the player enters, rather than you having to ask all
regions each frame. This will typically be delivered as an event from the collision system.
Explicit queries. Some queries are only relevant at a particular moment and must be resolved
instantaneously, for example, "Is that door in my way?"
Query partitioning. Some questions are only asked about specific types of objects; a camera
region may only ever care if an avatar enters it, not a creature or rolling boulder. If your
physics engine has an "early out" callback, you can use such application-specific type
information to partition the query space, eliminating expensive detailed testing for pairs of
objects you know will never interact.
If you're not using physics for a racing game or flight simulation, you're probably looking for
interesting gameplay - big complicated machines, moving platforms, and the like. It's likely that
many of these will be lovingly hand-animated by your talented artists. Unfortunately, hand
animation is not obligated to obey the laws of physics. How do we integrate keyframed motion
into a physically based simulation?
The approach I'll discuss here is particular to the Havok API; it happens to be what we're using,
and a proper discussion of these details requires a bit of specificity. It should be illuminating
regardless of your choice in API, however, as it demonstrates how time, movement, and frame
rate can all affect your simulation.
There are two primary issues involved with "physicalizing" keyframed animation:
1. Translate motion from the hierarchical scene graph into the flat physics world.
2. Give the physics engine enough information about the moving object to allow it to
interact realistically with other, non-keyframed objects.We've adopted a few simplifying
assumptions for keyframed motion, which greatly simplify implementation while still
capturing the essential functionality.
First, we consider keyframed motion to be nonnegotiable. A keyframed sliding wall can push a
character, but a character cannot push a keyframed wall.
Our second assumption is that we do not ask the physics engine to resolve interaction between
two keyframed systems. Because these systems are hand-animated and initiated by script,
avoiding interdependencies is the level author's domain.
When considering the integration of physics and keyframed animation, we first need to gather
the local-to-world transforms of all the keyframed objects, as we'll need them to feed positions
and velocities into the simulation. Because physics has no sense of hierarchy, you'll need all
your kinetic information in world space. One way to do this is to cache matrices as you traverse
your scene graph in preparation for rendering. This process gives you the matrix that you need
to match the flat transform structure of physics. Because of the no-negotiating rule for
keyframed objects, you can go ahead and submit the keyframed objects to your rendering
pipeline as you traverse, as physics will not change those transforms. This helps parallelism,
since all static and keyframed geometry can be transmitted to the graphics card before physics
even starts.
Keyframed objects participate only partially in the simulation; they are not moved by gravity,
and other objects hitting them do not impart forces. They are moved only by keyframe data.
For this reason, it is necessary to "freeze" the keyframed objects during the simulation phase in
which such forces are calculated and applied.
Keyframed objects are further marked at setup time as zero-order-integration objects. This
advises physics that these objects are explicitly positioned and instructs the engine to call back
during each integration substep. In this callback, you are responsible for updating the position,
orientation, linear velocity, and angular velocity for the keyframed object. This information is
critical for determining what happens when, say, your avatar is standing on top of that
keyframed elevator. Since the physics engine has no knowledge of the forces at work, it's
relying on you to help it fake the results.
To illustrate the importance of getting the velocity right, think about the difference between
standing on an elevator that's moving down and one that's moving up. In the down case, a
collision between you and the elevator should be resolved by you moving down. In the up case,
the exact opposite is desired. The only difference here is velocity, and an incorrect result will
embed your player up to the knees in the elevator floor - undesirable by most standards.
The process of calculating velocities is a simple matter of interpolating position and orientation
from the animated transforms that you stashed away a few paragraphs back. As an alternate,
higher-quality-but-higher-cost approach, you can ask your animation system at each physics
substep to interpolate a fresh position for you. This extra bit of work can be expensive, because
you have to reinterpolate the motion channel not only for the object in question but also for any
parent transforms.
What this gains for you is a greater degree of frame rate independence for keyframed physical
objects. To illustrate the problem of frame rate dependence, take a look at Figure 4.
Figure 4 shows an elevator reaching the bottom of its descent and moving back up. At frames 1
and 2, it's in the same position but moving in two different directions. If you're sampling
position only at frame boundaries, you'll conclude that the elevator is stationary. If you add a
sample in the middle, you'll have a more accurate simulation, at a cost of reaccumulating all
transform dependencies. This is a fairly dramatic case; in many other cases, you'll see the
object calculate different velocities at different frame rates. How much this matters to your
players depends in large degree on your game's animation speed, object velocities, and
tolerance for error in motion. In a surprising number of cases, this winds up not mattering, but
it's an accuracy trade-off of which you should be well aware.
The approach I just outlined is not the only one to handling keyframed motion. The Karma
engine provides a different facility in which the keyframe data is used as a constraint to the
object's position but does not control it directly. The end result is that the object is attached to
the animation in a springy fashion; if there are a lot of people in your keyframed elevator, it will
lag behind, springing ahead again as folks jump off. You can adjust the strength of the spring
and the speed with which it acts. This is a neat gameplay effect and can be excellent for the
right application.
Player control of the avatar is, for many games, where you're going to spend the most time
fine-tuning your physics integration. Every design trade-off you've made regarding physics
resolution, applying forces, keyframe data, and the like will all come together to affect how your
character navigates and how realistic it feels. The avatar is so central to the player's
perceptions that any glitch becomes extremely visible. I'm going to talk about the strategy
we're using for our application, a multiplayer, networked, third-person exploration game with a
mix of indoor and outdoor environments and an emphasis on photorealism. Naturally, your
approach will vary depending on the design of your game, but you'll probably recognize issues
that apply to your own situation.
A key decision for player control is the shape of the proxy you'll use to do collision for your
character. A popular choice is a simple capsule (Figure 5). This shape has several advantages:
It's smooth on the bottom, so it can glide over uneven terrain; it's radially symmetric from
above, so your avatar can turn in place without being pushed away from the wall; and it has no
sharp corners, which can get caught on narrow doorways. A subtler advantage is that since it
presents no sharp corners to the ground, it won't jump or stick as it hits polygon joins in an
otherwise flat terrain.
Notice that the character's arm sticks out through the capsule. He's illustrating a point, which is
that this capsule is used only for his gross movement in the environment, and it does not
handle detail interactions between, say, his hand and a lever. We use a completely different
mechanism for such detail interactions; the problems of detail interaction are beyond the scope
of this article, but suffice it to say that they're different enough to justify separate mechanisms
from those used for movement. As for the realism of the simplistic shape, it's instructive to note
that a large percentage of a human's motor control goes into maintaining the illusion that we're
not a bundle of flailing limbs all moving in different directions. A real human body does an
extremely good job of moving our head along on a smooth path. As a result, a simplified
physical body can actually lead to more realistic results than a multi-limbed physics body.
That's how we're shaped, but how do we move? What translates button presses into forward
motion? There are three fundamental approaches. First you can set the position and orientation
of your character directly. Second you can set the velocity (linear and angular) of your
character. And finally, you can apply forces to propel your character.
Setting position is attractive because it's so simple: You're standing here and you want to move
forward, so just add a vector. This approach falls apart pretty quickly, unfortunately, and it is
the least friendly to using physics in a general fashion.
Assume we start each frame in a physically valid position. Our player tells us to move forward,
so we construct a vector representing typical forward motion, orient it to our player's forward
vector, and add it to our position. Easy enough so far, and if all games were played on an
infinite flat plane, this would work great. But what happens when the position we want to
occupy overlaps with a wall, or even with a slight rise in the ground?
Big deal, you say, we have a fancy physics package. We'll just ask it to validate the position
before we finalize it. So what do you do when the position is not valid? You'll have to calculate
the point of impact, figure out where your character is deflected, and so on. This situation only
gets worse when you consider that there are other moving objects in the environment. The
problem is that by setting position directly, you've shut your physics engine out of the loop and
you now have to write more code to take its place. How do we get physics to do this work for
us?
Forces are a natural way to move a physics body around. On the good side, you'll find that a lot
of unplanned situations tend to work when you use forces: If your character hits some boxes,
he'll knock them over. If he's hit by a rolling boulder, the force imparted by the boulder will
combine with his walking force to move him in a new direction. He'll interact realistically with
slopes and walls. In general, it's a major improvement.
On the other hand, using forces to move the player somewhat decreases your level of control
over exactly how the player moves. Subtler issues such as friction come into play, and it
becomes hard simply to say, "Walk to this spot." Forces tend to highlight the fact that we're
using a simplistic capsule shape for the player and not a 400-bone musculoskeletal simulation.
While a golf ball might fly 100 yards if you whack it with a paddle, a human won't, and the
reasons why are a complex to emulate.
Positioning the player by setting velocity is a reasonably happy medium between the total
physics-unfriendliness of setting position and the loose control provided by forces. Rather than
saying what position you want to be in each frame, calculate how fast you need to be moving to
reach your target position and set the velocity on your physics body accordingly.
This has many of the same benefits as forces. If your character hits a wall, he'll either stop or
slide along it. If he steps off a cliff, he'll start to fall, and if he hits a slope he'll climb up it. Little
rises and falls in the ground will be automatically incorporated into your character's movement,
and you still have pretty tight frame-to-frame control of your character's movement; he won't
go flying off down a hill if you're setting his speed each frame, and you won't get an
unfortunate confluence of external influences causing him to fly through the air.
One drawback to this approach is that your motion is still based on movement on a flat plane,
so you're going to see some unrealistic movement when, for example, the ground drops away
rapidly. If you're just applying that forward-walk vector, downward gravitational force will be
applied every frame, but it will be blown away by your preordained velocity. As a result, the
character will fall at a slow, constant rate and won't accelerate toward the ground as he should;
he'll only get one frame's worth of acceleration each time before starting over at zero.
There are two solutions to this problem. The first is to leave vertical velocity alone when you're
walking, and the second is to stop walking when you're in the air. In actuality, both are
necessary; you don't want a single-frame departure from the ground (common when hitting a
bump) to interrupt your forward progress, so your walk behavior should continue for a short
time after leaving the ground. Since this can cause a few frames of floating when stepping off a
cliff, not setting vertical velocity is necessary to trim off any extra frames of floating when
cresting a peak. A rule of thumb is that each navigational state should have a sense of what
kind of velocity it can set: a walk can't set vertical velocity, but a jump can.
Another drawback to the velocity-based approach is that it does not automatically integrate
external forces. If your avatar is walking forward and suddenly slammed by a 10-ton rolling
boulder moving left, he won't budge unless you take extra measures to notice that the velocity
you sent down last frame has been modified somewhat. Resolving this correctly is somewhat
beyond our scope here, but it involves keeping track of the intended velocity and combining it
intelligently with the actual velocity, rather than just setting it.
We've just touched on a few of the issues regarding player control in a physical environment.
While they can be extremely challenging, solving these problems creatively will open up a lot of
new possibilities.
Focus on Creativity
Now that we've been freed of the burden of writing yet another BSP-versus-bouncing spheres
physics engine, we find that integrating a full-featured commercial engine can be just as much
work. The critical difference between the two approaches is huge, though: a robust
implementation of fully generalized physics is capable of forms of gameplay we haven't even
dreamed of yet.
I think that physics engines are going to do for gameplay what rendering engines have done for
visuals: provide a rich base of stable features, freeing implementers to focus on creative new
functionality rather than being chained to an endless wheel of reinvention. We've already seen
our play-testers using the laws of physics to invent new gameplay for which we hadn't even
planned. Managed carefully, this combination of planning and discovery holds great promise for
the future of games and gameplay.
Havok
http://www.havok.com/
Mathengine's Karma
http://www.mathengine.com/
URL: http://www.gamasutra.com/features/20020801/beaudoin_01.htm
[Purchase Book]
Like many other techniques, a quality gain can be achieved by computing the function per pixel
rather than per vertex. This gain is very noticeable when using large exponents since the
function varies a lot and sampling it at each vertex is bound to miss visually important details
(see Figure 1).
Therefore, we are particularly interested in finding a way to compute such a function on the
pixel shader. Like any pixel shader trick, it is important to minimize the number of textures and
blending stages since these are very limited resources. This text presents a simple shader trick
that performs a good per pixel approximation of a non-integer power function. The technique
works for input values between 0 and 1 and supports large exponents. The presented shader
does not require any texture look-up and is scalable, making it possible to spend more
instructions in order to decrease the error or to reach greater exponents.
We first consider and analyze two typical techniques used to compute a power function on the
pixel shader. We then expose some mathematical background used throughout the text. Finally,
we show how the algorithm can be used to perform smooth conditional functions and complex
bump-mapped Phong shading. The actual implementation of the approximation as a pixel
shader program is discussed in detail.
Figure 1. Gouraud shading (left) and Phong shading
(right)
Traditional Techniques
When confronted with the problem of computing a power function on the pixel shader, two
simple techniques come to mind. First, it seems possible to proceed through a 1D texture look-
up, and second, applying successive multiplications looks promising.
Texture Look-Up
Linearly interpolated textures can be thought of as piecewise linear functions. In particular, 1D
textures with a linear filter are really a function taking a value between 0 and 1 and mapping it
onto another value in the same range . This looks promising for our problem since an input
between 0 and 1 raised to any power greater than 0 yields a result between 0 and 1.
Listing 1 shows a piece of code that builds a 16-bit monochrome 1D texture of resolution Resi
to compute xn:
Once this texture has been constructed, a simple texture look-up pixel shader can be used to
perform the computation, provided that the value to raise to power n is placed in an
interpolated texture coordinate. The pixel shader at Listing 2 shows how to apply the power
function on the result of a dot product, like it is often required. Note that this code only works
for pixel shader versions 1.2 and 1.3. The code for version 1.4 is presented in Listing 3.
ps.1.2
dp3 r0, r0, r1 ; Performs dot product between this 3D vector and
r0
phase
z it uses up one texture stage, which may make it unfit to algorithms requiring many
textures,
z changing the value of the exponent n requires regenerating the texture, unless a 2D
texture is used in which case a limited number of predefined exponents can be used,
z for pixel shaders versions less than 1.4, a 1D texture look-up cannot be applied to
intermediate computation results unless multiple passes are used.
This last limitation is often a major drawback since, in usual cases, the power function must be
preceded by a vector renormalization (such as done with a cube map) and a dot product. With
large exponents, the vector renormalization is especially important. This is due to the fact that
the maximum value of a dot product is the product of the length of the two vectors. If one of
these is not normalized, the dot product can never reach 1. When raised to a large exponent, a
value smaller than 1 will rapidly move toward 0. This translates to visual details being washed
out. Figure 2 shows a vector interpolated with and without normalization, followed by a dot
product, and then raised to a high power. It is obvious that the detail (for example a specular
spot) is washed out in the second version.
Figure 2. Result of not normalizing vectors before applying a power
function.
Successive Multiplications
Since raising a value to an integer power simply requires multiplying a value with itself a
number of times, it seems possible to approximate a non-integer power function through
successive multiplication steps.
For example, the pixel shader at Listing 4 shows how to raise t0 to power 16. Analyzing this
scheme indicates that log2 n multiplications are required to raise a variable to the power n,
when n is a power of 2:
ps.1.0
tex t0
Listing 5 shows a pixel shader that raises t0 to power 31. Analyzing this shows that, in general,
for n in the range [2a,2a+1), the algorithm can require 2a multiplications and a temporary
variables2.
ps.1.1
tex t0
z only supports discrete changes in the exponent, making it impossible to change the value
of n in a continuous fashion,
z requires a lot of instructions for a large exponent,
z requires a lot of instructions and temporary variables for non power of 2 exponents.
These last two problems often limit the usefulness of successive multiplications, since practical
exponents have a tendency to be large and are usually not powers of 2.
Mathematical Details
If we take xn in the range from 0 to 1 then, for a large enough n, the function is very close to 0
on most of its domain and abruptly goes to 1 when x approaches 1. This is shown in Figure 3
for increasing values of n. This particularity will be the basic concept used for developing our
approximation.
What is interesting to note is that the result xn is greater than 1/256 only for values of x
greater than 256-1/n. For example, if we take n = 16, the function will be greater than 1/256
only if x is over 0.707, with n = 64 this value becomes 0.917. Since 1/256 is the smallest value
displayable on 8 bits per channel hardware, approximating xn with 0 will yield no perceptual
error for values of x between 0 and 256-1/n.
Now, if we look at xn for input values greater than 256-1/n, we can see that it looks pretty much
like a scaled and offset power function of a lower degree. Figure 4 shows the function x16 being
approximated by x4 scaled and offset horizontally to reach 0 when x = 256-1/16.
Therefore, using the null function from 0 to 256-1/n and correctly scaling a lower degree power
function for the rest of the range seems to constitute an adequate approximation. Say our
approximating function uses an exponent m, a scaled and offset version of the function can be
written as (Ax + B)m, where A is the scale and B is the offset. Now, naturally, we want this
approximating function to be equal to 1 when x is 1. Also, we want it to reach 0 when x = 256-
1/n
. We can therefore solve for A and B. We find that A = 1 / ( 1 - 256-1/n ) and B = 1 - A. The
approximating function can be written as:
By noting that x < 256-1/n if and only if Ax + B < 0, the above function can be rewritten more
concisely as:
Note that we will always consider m ≤ n in the rest of the text. This is because we want the
approximating function to have a lower degree than the original.
This technique can now be used with real examples. Figure 5 shows the result of approximating
xn with max( Ax + B, 0 )m for n = 16 and m = 4. In this case, A and B are computed as
described earlier and their value is A = 3.4142 and B = -2.4142. The graph displays both the
original and approximated curves. The normalized error function is also plotted in order to show
how the error is distributed to the left and right of the point where both curve crosses.
A first analysis shows that the approximation gives good results for these values of A, B, n and
m. However, if we look more closely at the graph we can notice that the error is not distributed
equally on each side of the crossing point. This leads us to think that the maximal
approximation error could be lowered by adjusting A and B in order to move the crossing point.
In fact, for arbitrary values of n and m, the technique we've described to select A and B doesn't
give any guarantee on the maximal approximation error. In practice, however, it may be suited
to many applications.
In order to optimize the approximation error, one should seek values of A and B for which the
maximal error on the left and right side are equal. We solved this problem using the numerical
approach presented in Listing 6 and described next.
First we need a function that, given approximate values for A and B, is able to compute the
maximal error on the left and right side of the crossing point. To do so, we simply find the zero
of the derivative of the error function on the left and right sides and evaluate the error function
at these points. This is accomplished by the function EvaluateMaxError.
Then, we perform a binary search, changing our guess values for A and B in order to move the
crossing point. When the left error is greater than the right error, the crossing point is moved to
the left, otherwise it is moved to the right. This is accomplished by the function FindAB.
This algorithm guarantees that we will pick values of A and B that minimize the maximal
approximation error for any n and m, provided that 1≤ m ≤ n.
Listing 6. C code to determine optimal scale and offset for some approximation
Using this algorithm with the previous example, where n = 16 and m = 4, yields A = 3.525 and
B = -2.525. This is illustrated in Figure 6. It can be seen that the error is equally distributed to
the left and right of the crossing point, indicating that the maximal error has been minimized.
It should be noted that, by selecting A and B through the algorithm of Listing 6, we lose the
property that the max( Ax+B, 0 )m is equal to 0 only for values of xn < 1/256. However, this
doesn't hurt our approximation since the maximal error has been lowered.
Table 1 shows the maximal approximation error for typical values of n and m. It also shows the
optimal values of A and B for these values.
Table 1. Scale, offset and maximal error
for typical approximations.
Approximating a power function on the pixel shader requires us to translate the preceding
mathematical reasoning into the pixel shader assembly language. Doing so requires us to
compute the function max( Ax+B, 0 )m through the set of available microcode instructions. We
also need a way to specify the variables present in this equation, namely A, B, m and x.
We can rule out a number of these variables easily: the input variable x will simply be stored in
a general-purpose register and the exponent m will be decided in advance. For variables A and
B we will consider two scenarios. At first, they will be fixed ahead of time and their content will
be stored in constant registers. In the second scenario, we will show how A and B can be
modified dynamically on a per pixel basis.
Constant Exponent
Let's first study the case where A and B do not change per pixel. In this scenario, A is placed in
the constant register c0 while B is placed in c1. This means that the exponent n being
approximated is constant as long as c0 and c1 remain unchanged.
Now we need to compute max( Ax+B, 0 )m. First, the max( …, 0) function is taken care of using
the _sat modifier available on the pixel shader. Then, we pick m as a power of 2 selected to
approximate with enough precision the target exponent n. We then perform a bias and scaling
with mad, followed by log2 m self-multiplications with mul. The result is the pixel shader of
Listing 7, where the power function is applied to each element of the vector r0. It should be
noted that log2 m + 1 is equal to the number of pixel shader stages required. Therefore, in an
actual shader, the number of free instructions could limit m.
ps.1.0
.
. ; repeat (log2 m) times the mul instruction
.
mul r0, r0, r0 ; r0 = max( Ax + B, 0 )^m
There is an important problem with the previous pixel shader. In fact, since all constant
registers must be between -1 and 1, we have a very limited range of values for A and B. Table
1 shows that, for typical values of n and m, A is always greater than 1. Therefore, in practice,
the proposed pixel shader is invalid.
To limit ourselves to scale and offset values in the allowed range, we first rewrite the
approximation function as follow:
A = A'k1/m B = B'k1/m
From these last two relations we can see that, with A' and B' between -1 and 1, we can obtain
values of A and B between -k1/m and k1/m. Given that k is greater than 1, this translates to an
increased range for A and B.
The pixel shader lets us compute k max(A'x+B', 0)m with k greater than 1 through its multiply
instruction modifiers _x2 and _x4. If we take the program of Listing 7 and apply such modifiers
to some or each of the mad or mul instructions, we will get a k greater than 1.
We go in the opposite direction and find the maximal values for A and B given k and m, as
performed by the function MaxAB. This result can then be converted in a maximal value for n, as
computed by MaxN.
// Table containing multiply instruction modifier for each instruction (1, 2 or 4)
int Multiplier[] = { 4, 4, 4, 4, 4, 4, 4, 4 };
// Compute values of A' and B' given A and B and a multiplier table
// LogM: log of m in base 2 (number of instructions - 1)
void ComputeApBp( int LogM, double A, double B, double &APrime, double
&BPrime )
{
double Temp = 1.0/MaxAB( LogM ); // Note that k -1/m = 1/MaxAB
APrime = A * Temp;
BPrime = B * Temp;
}
// Compute maximum absolute values for A and B given some m and a multiplier
table
// LogM: log of m in base 2 (number of instructions - 1)
double MaxAB( int LogM )
{
double m = pow( 2.0, LogM ); // Find the value of m
double K = ComputeK( LogM ); // Compute K
return pow( K, 1.0/m );
}
double A;
double B;
do
{
n = (n0 + n1)/2.0;
FindAB( n, m, A, B );
return n;
}
Listing 8. C code for computing corrected scale and offset A' and B'
It can be seen that the maximum value n is obtained when the modifier _x4 is used for each
instruction in the code. Given that the value for A' and B' are stored in constant registers c0
and c1 respectively, the pixel shader at Listing 9 performs this approximation.
ps.1.0
Table 2 shows the maximal range for A and B and the maximal exponents n that can be
obtained with the previous shader for various values of m.
Naturally, if for a given m we want to limit ourselves to exponents smaller than the maximal n
listed in this table, we can remove some _x4 modifiers or replace them by _x2. When doing
this, we figure the new maximal n by using the MaxN function with an updated Multiplier
array.
Not using _x4 modifiers at each instruction is often a good idea since it can help reduce the
numerical imprecision often present in pixel shaders. This lack of precision is mostly noticeable
for small values of n since they translate to values of c0 and c1 close to zero. Such small
numbers may suffer from an internal fixed-point representation and yield visual artifacts.
Once we have picked these values we simply take a texture image of the desired n and
translate each texel to their corresponding value of A. The texels are then translated to A'
through a multiplication by k-1/m. This time, however, instead of storing the result in a constant
register, we update the texture. The program in Listing 10 executes the process.
Once such a texture has been generated and placed in texture stage 0, we can easily extract A'
inside a pixel shader. We can also extract B' by recalling that B = 1 - A. Since B' is the result of
multiplying B by k-1/m, we can write B' = (1 - A)k-1/m = k-1/m - A'. Since m is fixed, we store k-
1/m
in constant register c0 and perform a simple subtraction to extract B'.
The pixel shader of Listing 11 approximates a power function with n varying per pixel. This
shader uses modifiers _x4 for each instruction, the texture and the constant register c0 should
therefore be generated accordingly.
ps.1.0
This shader seems to show that one extra instruction is required to handle an exponent n
varying per pixel. However, this is only true if we cannot spare additional texture space. In the
case where a texture or a texture component is still available, we could precompute the value
of B' and store it there. However, we believe that textures are often a more limited resource
than pixel shader instructions, this is why we suggest you use the approach presented above.
As a last remark, we can note that the power function often only needs to be computed on a
scalar. Therefore, provided that the input value x is placed in the alpha channel, the pixel
shader can co-issue instructions. In such a case, we can also limit our usage of constant
registers by using only the alpha channel of the constants. This technique is applied in the pixel
shaders presented in the rest of the text.
Applications
The nature of the power function makes it very interesting for a number of applications in
computer graphics. Any effect that should be visible at some point and fade-out more or less
rapidly for neighboring points can benefit from such a function. In this section we will present
two techniques that require a per pixel power function: smooth conditionals and specular Phong
shading.
This instruction loads dest with src1 or src2 depending whether src0 is positive or not. If we
consider this instruction with values of src1 = 1 and src2 = 0, we obtain the function
illustrated at Figure 7.
To overcome this problem, we can use the function illustrated at Figure 8. We call such a
function a smooth conditional because it smoothly and rapidly goes from one value to the other
as some threshold is crossed.
This formula can be computed on the pixel shader using the code of Listing 12. This code is
similar to the one of Listing 9. The difference is that it uses two extra cmp instructions, one for
the absolute value and one for the condition x>0. Also, note that the last mul has a _x2
multiply instruction modifier although it should be considered as a 4 in the Multiplier table
when generating A' and B'. This is done in order to account for the multiplicative factor in the
term 0.5 (1 - |x|)n appearing in the formula.
ps.1.2
We can use the above code to build a smooth conditional selection between two arbitrary values
src1 and src2. To do so, we simply add a lrp instruction at the end of the previous pixel
shader. This linear interpolation is made to map 0 to src1 and 1 to src2. The result is a
function that mimics the standard cmp in a smooth fashion.
The presented code requires a lot of instructions; however we can co-issue them if we want the
smooth conditional to act only on the alpha channel. Moreover, the simplified asymmetric shape
of Figure 9 often constitutes a good enough smooth conditional for x>0. Given that the input x
is saturated (greater or equal to 0) then this function can be expressed as 1-(1-x)n.
This is computed with minimal change to the code of Listing 9. We simply need to use the invert
source modifier at the input and output.
Other variations on the smooth conditional pixel shader code allows for various other arithmetic
tests. In fact, the simple power function xn can be considered as the smooth conditional for
x>1.
So let's see how we can build a pixel shader that equals 1 for pixels within some radius r of a
point P and smoothly goes to 0 for pixels outside this radius. First, for each vertex, we'll require
a vector R that joins point P to the vertex. This vector needs to be scaled so that its length
equals 1 for a vertex that is exactly at distance r from P. We place the vector P in texture
coordinates 0. We will directly use this vector, therefore no texture needs to be bound to stage
0.
We then select the approximation exponent m for the power function used in the smooth
conditional. In the presented example, we take m = 4. The multiply instruction modifier _x4
was used with each instruction. Since we use a fixed exponent n for the shader, we place A' and
B' in constant registers c0 and c1.
The pixel shader of Listing 13 uses this technique to apply a simple diffuse lighting equation to
all pixels within some radius of a given point. The diffuse color is placed in interpolated color v0,
texture 1 contains the decal texture to apply and texture 2 holds an extra diffuse light map. A
constant ambient factor is stored in constant register c2. More complex effects could use the
pixel shader same trick in order to limit themselves to a volume. Also, more complex volumes
could be devised by applying some per vertex or per pixel process on P.
ps.1.0
; c0 ; A'
; c1 ; B'
; c2 ; Ambient factor
; c7 = 1, 1, 1, 1 ; Uniform white, used in shader
; Pixel ops
dp3_sat r0, t0, t0 ; r0 = max(0, P.P) = |P|^2
Phong Shading
One of the major problems of per pixel lighting resides in computing the specular component of
the final color. This component is due to the reflection of the light itself on a shiny material. It is
easy to see that the reflection of a point light on a perfectly shiny sphere is a single point.
Unfortunately, very few surfaces are perfectly shiny. To account for non-perfect reflections,
sometimes called dull reflection, Phong (and Warnock before him) introduced a very popular
model. This model relies on the use of a power function where the exponent depends on the
shininess of the material.
Pixel shader algorithms to perform Phong shading have been described before. However, they
performed the per pixel power function using the usual techniques described at the beginning,
therefore suffering from the problems of such techniques. This often leads to multi-passes
algorithms or it reduces the control over the shininess exponent.
We will now show how, using the power function algorithm exposed earlier, we can perform
realistic per pixel Phong shading including a normal map, a color diffuse texture map, a color
specular texture map, and a shininess map.
Before we can express this equation, we need to detail the variables that are needed. First, the
shading depends upon the light direction, described by unitary vector L. We also need to know
the surface normal at the shading point, noted N. Finally, we need to use some vector to
compute the specular component. Like it is often the case for per pixel shading, we do not
directly use the Phong equation. Instead we take the Blinn half-vector H, that is the unitary
vector falling directly between the view direction V and the light direction L. Figure 10
illustrates these vectors.
When using normal mapping, we distinguish between the perturbed normal used for lighting,
noted N' and the real surface normal noted N.
The scalar values that are needed for the computation are the ambient, diffuse and specular
coefficients noted respectively MAmbc, MDiffc, and MSpecc for the material, and LAmbc, LDiffc,
and LSpecc for the light. Here the index c is R, G or B to indicate one of the color components.
First, we suppose that the light ambient, diffuse and specular coefficients do not vary from one
pixel to the next. This means that the algorithm proposed cannot handle slide-projectors or
other kinds of textured lights.
We also suppose that the material ambient coefficients do not vary arbitrarily from pixel to
pixel. Instead, these are linked to the material diffuse coefficients using the equation MAmbc =
KAmbc *MDiffc, with KAmbc constant within the shader. The equation states that the ambient
color is always the same as the diffuse color up to some constant. This is not a very limiting
assumption since this relationship between material ambient and diffuse coefficients is often
witnessed in practice.
The three coefficients MDiffc and exponent n can vary from pixel to pixel and therefore need to
be expressed in a four-component, 2D texture map. Recall that we do not store n directly,
instead we place the corresponding value A' in the texture's fourth component. Coefficients
MSpecc can also vary at each pixel and can be placed in a separate three-component, 2D
texture. These coefficients act as a color gloss map, effectively an extension of the traditional
single-component specular gloss map.
The perturbed normal N' is expressed as a tangent-space normal map. We therefore use a 2D
texture map containing color-encoded normalized vectors that can be accessed using a single
2D texture look-up. We refer the reader to other real-time shading texts to learn more on
tangent space normal maps.
To effectively use a tangent-space normal map in our lighting equation, we need to have a
normal map representation of the light vector L and halfway vector H. As discussed earlier, the
halfway vector needs to be renormalized per pixel; otherwise important visual artifacts will
occur when the power function is applied. We therefore interpolate H through a texture
coordinate and use a cube map look-up to renormalize it. A renormalization cube map, as
illustrated in Figure 11, contains a color-encoded vector in each texel corresponding to the
unitary vector pointing in the direction of that texel.
Once the light vector is only used in linear computations, not renormalizing it has almost no
impact on the visual result. Therefore, we skip per pixel renormalization in order to make the
best usage out of our texture resources. This means that we can store the light vector L in an
interpolated color.
It should be noted that, since interpolated colors do not support negative values, we place the
sign corrected light vector in color 0. This can be written as L' = 0.5(L + (1,1,1) ).
We need to have access to KAmbc and LAmbc that are constant from one pixel to the next. Due
to the nature of the equation, we can precompute the product of these coefficients. Therefore,
we store the resulting vector (KAmbR*LAmb, KAmbG*LAmbG, KAmbB*LAmbB ) in the first three
components of constant register c0.
We also need the values of constant coefficients LDiffc and LSpecc. These are stored in the first
three components of registers c1 and c2, respectively.
Finally, we need to pick the approximation exponent m used for the power function. In our
example, we use m = 8. We also use _x4 as the multiply instruction modifier for each
instruction of the power function approximation. Since we wish to use a value of n varying per
pixel, we must precompute k-1/m = 0.074325. We store this value in the alpha channel of
constant register c0. Looking at Table 2, we find that the maximal n that can be achieved is
116.34. Also, using the function TranslateTexture of Listing 10 we can convert per texel
values of n into values of A' to be placed in the alpha component of texture 0.
The pixel shader that computes per pixel Phong shading then becomes:
ps.1.1
; v1 RGB: light vector (L') (sign corrected)
; c0 RGB: ambient light (Kamb*Lamb), A: 1/MaxAB(logM) = 0.074325
; c1 RGB: diff light (Ldiff), A: 0
; c2 RGB: spec light (Lspec), A: 0
Summary
We have presented a method of approximating a non-integer power function on a pixel shader
by using as few texture stages as possible and gracefully degrading in accuracy depending on
the desired exponent and available number of blend stages. Furthermore, the technique can be
used for single or multiple channels, thus adapting nicely to individual shader requirements.
The method has several advantages over traditional exponentiation techniques that use either a
texture look-up or a series of sequential multiplications. Texture look-ups are only accurate for
pixel shader versions 1.4 and greater, and even then will require two phases. Sequential
multiplications need a large number of stages to compute high power-of-two exponents and
even more for non-power-of-two. Additionally, since the multiplications are inherently uniform
during the entire shader, they do not allow for a smooth variation in power.
A couple of applications were suggested, and Phong shading in particular is covered in the
results below. We believe such a useful technique can be applied to many other algorithms that
require a power function, especially considering that it can be abstracted to any effect requiring
a sharp yet smooth transition, such as a spotlight's cone falloff.
The per pixel variation of the exponent, which is a handy extension to the basic principle, can
provide important visual cues for surfaces whose specularity varies, such as for a material
including both metallic and organic features. Its main disadvantages are that m constrains the
lower bound of the specular exponent n, as explained in the mathematical details section, and
that one component of a texture must be used to encode the exponent. The latter, however, is
expected of any technique that varies the exponent per pixel.
The shading code of Listing 14 was applied to red spheres with different properties, such as
faceted per pixel exponent maps, wrinkled normal maps, and orange specular maps. The
results can be seen in Figure 12. Due to the many iterative multiplications, there is a large
accumulation of error that manifests itself as banding. Generally speaking, the greater the
exponent, the more banding will be evident, however, this is mostly noticeable on smooth
surfaces, such as those expressed with uniform normal maps. The banding artifacts are less
significant when using normal maps that contain some perturbation because the specular falloff
is quite sharp due to the abrupt changes in the normal. Therefore, visual artifacts are reduced
as detail in the normal map is increased.
Banding can also result from reduced instruction counts. If additional instructions are available,
we recommend using them to maximize precision. Note that since the Phong shading algorithm
presented only requires exponentiation of a single component, the instructions easily fit into the
scalar pipeline of the pixel shader, which reduces the number of dedicated stages. The shader
code of Listing 14 consumes two stages purely for exponentiation purposes, but instructions in
the vector pipeline can be co-issued at these stages if desired.
For example, the specular texture can be removed if the diffuse and specular materials are
represented by a single texture. We can then use a cube map at that stage to represent the
surrounding environment, even encoding a Fresnel term in the remaining colour register. The
math computations can easily be accommodated within the two remaining vector instructions.
Finally, the images in Figure 12 were rendered on hardware with 8 bits of fractional precision.
Other hardware is available which has more than 8 bits of fractional precision and will suffer
less from the banding artifacts.
We hope you found this trick helpful and that you will find many more uses for an
approximating power function in your shading endeavors.
Figure 12. Per pixel Phong shading on spheres.
Clockwise from upper left, uniform normal and
exponent maps, uniform normal map and faceted
exponent map, noisy normal and exponent maps
with uniform orange specular material, and noisy
normal and exponent maps with uniform white
specular material.
By Michal Bacik
Gamasutra
July 17, 2002
URL: http://www.gamasutra.com/features/20020717/bacik_01.htm
Rendering engines used in today’s game titles utilize various techniques for hidden surface
removal (HSR), different techniques being suit-able for different game genres. For example,
action games played in closed-in locations such as rooms, caves, and tunnels need an engine
allowing fast ren-dering of a few hundred polygons, a high frame rate, and a high level of
details to impress players. Conversely, a game that is taking place outdoors requires quite a
different approach. Let’s discuss appropriate approaches for the latter. Not too long ago, games
ran in software mode only, without the help of 3D accelerators. With the CPU doing all the
work, engines rendered as few pixels as possible, typically with BSP-based scenes and BSP
rendering.
With the advent of 3D accelerators and the invention of the depth-buffer (or Z-buffer), the strict
sorting of poly-gons slowly faded out of engines, and software engineers start-ed trying out
different techniques. Game designers wanted to move their worlds outdoors, and new graphics
hardware made such designs possible. As graphics hardware power increases, however, so do
requirements for game content and high-quality graphics, creat-ing room to waste processing
power with inefficient usage of computing resources. Let’s discuss one technique for hidden
surface removal usable in 3D engines, developed while creating some of the games I’ve worked
on. The technique, utilizing object occlusion, is for out-door rendering. The fundamental entities
working for us will be so-called occluders, and we’ll come to them soon. First, some rendering
theory. Figure 1 shows a rendering pipeline using the technique pre- sented here. This is a
higher-level look at the rendering process, without going into details about actual polygon
rendering.
One of the first steps toward optimized rendering is keeping objects in a scene organized. Most
commercial modeling packages and 3D engines use a scene hierarchy, and for good reason:
used smartly, it allows for fast rejection of entire hierar-chy branches based on bounding
volume tests, and may also accelerate collision testing. A scene hierarchy tree is a collection of
objects (visuals as well as nonvisuals) in an organized fashion. In a tree structure, a scene has
its root object, which may have children objects linked to it, and these child objects may have
other objects linked to them. The objects may be visuals (characters, terrain, items in the
game), as well as nonvisual objects (such as 3D-positioned lights, sounds, dummies, and other
helper objects).
Bounding Volumes
A bounding volume is a simple geometrical object roughly representing the volume of a real
object’s geometry. It’s as small as possible while still enclosing all vertices of the object. The
most suitable geometric objects for bounding volumes are spheres and boxes. For the
techniques presented in this article, I recommend using both types and defining a structure
representing the bounding volume:
struct S_bounding_volume{
struct{
struct{
float x,y,z;
}min,max;
}box;
struct{
struct{
float x,y,z;
}pos;
float radius;
}sphere;
};
You’re probably already using an HSR technique utilizing bounding boxes or spheres, so why
use a combination of both? The answer is simple: speed. While bounding spheres allow for very
fast collision detection using a simple distance test, the volume it encloses is often much
greater than the actual object it represents (Figure 2). Too often, then, we consider an object to
be on-screen when none of its vertices actually would be.
Figure 2: Visual object with bounding box and sphere. Notice that
the sphere is a bit larger than the box.
On the other hand, a bounding box is a closer match to the shape of an object, but tests with
boxes are slower. Usually we do a two-pass collision/intersection test, using a bounding sphere
in a first pass and oriented bounding box (OBB) in the second pass. Because the first test
rejects most invisible or non-clipped objects, the chances that a second test will never be
executed are high. At the same time, performing the bounding box test yields more accurate
results, leading to fewer objects being rendered.
This mixed bounding volume is also suitable for other pur-poses, such as collision detection or
physics.
Node Volumes
Every visual object in a scene hierarchy should have an asso-ciated bounding volume, so that
with just a few math oper-ations, we’re able to say if each object is visible or not. As I
mentioned previously, it wouldn’t be very efficient to test all the scene objects in each rendered
frame. For example, in a racing game with 20 cars, you’d end up testing all the vehicle’s objects
(wheels, doors, driver, driver’s fingers, and on and on) 20 times, which could easily make for
upwards of 25 objects on each car multiplied by 20 cars. Why perform 25 tests when a single
test for each car can accomplish the same result?
With complex models or models used in a scene multiple times, finding a way to skip the
processing of an entire model or models if they’re out of view would be highly desirable. For this
reason, we’ll introduce a node bounding volume. This is the same bounding structure defined
above, but it doesn’t enclose vertices; rather it encloses all bounding volumes of a group of
objects, the child objects of the node (objects, models, and so on).
Assuming you have static geometry, calculating a visual object’s extent is done once before
starting rendering. The bounding box is axis-aligned (AABB) in the local coordinates of an
object, defined by two extreme points of the box. Anytime a bounding box is used in
computations, it must be transformed into world coordinates using the object’s transformation
matrix, and thus changed to an oriented bounding box. Because OBBs cannot be specified by
just two corner points, we’ll need to extract the two corner points of the AABB into the eight
cor-ner points of the OBB, and transform all these points to world coordinates with the object’s
transformation matrix (the same one that will be used to transform the object’s vertices to
world coordinates). The bounding sphere is also in local coordinates, and must be transformed
(position) and scaled (radius) to world coordinates before being used in computations.
The situation with node bounding volumes is a bit more dif-ficult. Because the position of
objects may change in real time, this bounding volume must be computed at run time
whenever any object in the group moves. The best method is a lazy evalu-ation programming
technique — in other words, computing the value when it’s needed. You may implement a
system of invali-dation of a node’s bounding volume when the child’s matrix changes due to
position, rotation, or scale. This system is hard-er to implement and debug, but it’s critical for
fast 3D culling, both rendering and collision testing.
By measurements I’ve made in our 3D system, the dynamic bounding volume update takes no
more than 1 to 2 percent of total CPU time when the game is running.
Because it’s very easy to detect collision against it, a convex hull is another basic geometry
object used in occlu-sion testing. During occlusion testing, we’ll detect a collision of a 3D point
with a hull and a sphere with a hull. Since we must detect how the bounding volume of an
object collides with viewing volumes (screen frustum and occlusion frustum), hidden-surface
removal has much in common with collision detection.
A hull is defined as a set of planes that forms the hull, with their normals pointing away from
the hull. Any point in space is a part of the hull if it lies behind all the planes forming the hull.
For our purposes, we’ll also use an open hull, which is a hull that represents an open 3D
volume.
All the information we need to compute the convex hull is a set of 3D points. During the
computation, we’ll remove redun-dant points, which are inside the hull and do not lie on the
skeleton of the hull. We’ll also need to compute the edge faces of the convex hull; these faces
are not necessarily triangles, as they are in 3D meshes.
We’ll utilize planes of these edge faces for our occlusion com-putations: for example, a fast
check of whether a 3D point is inside of a convex hull. If a point in space is behind all the edge
faces of the hull (assuming the planes’ normals point out from the hull), then it is inside of the
hull (Listing 1).
Speaking from my own experience, writing code for building convex hulls is quite a difficult
task. Even when you implement the code properly, you’ll encounter problems with float-round-
ing errors (using doubles won’t solve anything). Regardless of which algorithm you choose, with
some point sets you’ll end up with hulls that are invalid after final validity checks are per-
formed. Having multiple points placed close together on a plane is a common source of
problems.
After I struggled for months with code that computed it wrong, then spending another month
writing my own convex-hull computation code, I finally switched to Qhull, a freeware package
that utilizes the quick-hull algorithm. It’s available at www.geom.umn.edu/software/qhull.
Although the QHull library is a robust, platform-independent package that can do many
additional tasks, we will only be need-ing it to compute the convex hull using our structures. It
handles rounding problems by joggling points; if computation fails, it shifts points randomly by
a small value and recomputes until it gets a proper hull.
If you decide to use this or any other available package, be prepared to spend a day or two
reading its documentation and writing code to call it; you’ll save a month otherwise spent
writing and debugging your own system.
The final result we need after computation is a set of filtered points that form a skeleton of the
hull, and a set of faces. Following is an example of what we get (faces are kept as indices in the
point set):
struct S_vector{
float x,y,z;
}; struct S_face{
int num_points;
unsigned short *indices;
};
Now we use the help of C++ STL vector class for storing our vertices and faces:
std::vectorhull_points;
std::vectorhull_faces;
Note that when inserting into vector , a copy constructor of the class being inserted is called.
Make sure you have imple-mented a copy constructor of S_face so that memory for indices is
properly allocated and freed.
A viewing frustum is a 3D volume. For practical reasons, we can treat it as a convex hull,
simplifying further computa-tions. The viewing frustum is typically a cut pyramid (if a pro-
jection transformation is used) or a cube (if an orthogonal transformation is used). This article
assumes that projection transformation is used, and many times the camera’s position will form
one of the points of the frustum’s hull.
Occluders in the real world may be thought of as objects which occlude (or obstruct) your view
of other objects behind them (Figure 3). Take an example of an occluder — a building, a hill, or
a car — all these things occlude your view to objects physically located behind them.
Transparent or translu-cent objects are not good view occluders, because as light rays pass
through the material, so we’ll ignore transparent objects as occluders in 3D rendering.
Objects in the real world consist of atoms and molecules, and pretty much every atom can
either be occluded by another atom or not (in which case a viewer can see it). In computer
graphics, however, objects are built from vertices and polygons, and these vertices and
polygons are usually grouped into primi-tives, which are rendered together in order to achieve
good graphics throughput. Our task consists of rejecting as many of these primitives as possible
in the early (preprocessing) phase of our pipeline, without affecting the viewer’s experience by
reject-ing objects that should be rendered.
In practice, this means finding which objects are fully occlud-ed by other objects and rejecting
these occluded objects from any further processing. A solid object, sufficiently big to be worth
the additional computations associated with occlusion testing, is an ideal occluder.
In an ideal 3D engine, we could detect occlusion of even the smallest primitive behind any
object in a scene. In reality, however, we must find some algorithm that allows fast detec-tion
of occlusion for a sufficient number of potentially visible primitives. For that reason, we’ll
simplify the occlusion vol-umes to convex hulls.
Convex hulls allow for sufficient approximation of a 3D object in most cases. When it is not
possible to represent the shape of an object with a convex hull, you can use more hulls to
accomplish the task. The occlusion hull doesn’t need to copy the exact shape of visual object it
works with. In many cases, an occluder may consist of many fewer faces than the visual
primitive itself, roughly copying the shape of the visual (Figures 4 and 5). The rule to keep in
mind here is that an occluder’s shape shouldn’t be bigger than the shape of visuals it
represents; otherwise your engine will end up rejecting primitives that should be rendered,
resulting in an ugly graph-ical artifact.
Figure 4: A house, suitable for occluding other
objects in 3D scene.
Figure 5: The occluder object (drawn in green wire-
frame), representing the simplified shape of the
house.
In determining HSR, occluders should be processed first. Because the position of a camera
(viewer) changes constantly in 3D games, so does the occlusion frustum, the 3D volume cast
by the occluder. Our task is to compute the occlusion volume at the beginning of rendering
from a particular camera view. (If you render multiple views in a single frame, a mirror for
example, this step must be done for each rendered view.) After this prepro-cessing step, you
should collect all occluders on the screen, against which you’ll test other potentially visible
primitives. Some optimization tips: Minimize the number of occluders you include in the test
list, done by determining if a particular occluder is occluded by another occluder. Figure 6
shows this possibility. Also, don’t consider occluders that may potentially hide only a small
amount of primitives. You should reject occluders that occupy a small area of screen space.
Let’s have a closer look at the information needed in order to detect whether an object is
occluded. Looking closer at the occlusion volume, we see that occlusion volume is actually
another kind of convex hull, expanded from the viewpoint into infinity. The occlusion volume is
built from all of the occluder’s polygons facing the camera, and from contour edges (as seen
from the camera) expanded away from the camera (Figure 7).
Actually, this volume is open — there’s no back plane that would cap the volume — because the
occluder hides everything behind it into infinity. And any plane we save will speed up fur-ther
computations.
To build contours from a convex hull, we use a simple algo-rithm utilizing the fact that each
edge in a convex hull connects exactly two faces. The algorithm is this:
1. Iterate through all polygons, and detect whether a polygon faces the viewer. (To detect
whether a polygon faces the viewer, use the dot product of the polygon’s normal and
direction to any of the polygon’s vertices. When this is less than 0, the polygon faces the
viewer.)
2. If the polygon faces viewer, do the following for all its edges: If the edge is already in the
edge list, remove the edge from the list. Otherwise, add the edge into the list.
After this, we should have collected all the edges forming the occluder’s contour, as seen from
the viewer’s position. Once you’ve got it, it’s time to build the occlusion frustum itself, as shown
in Figure 7 (note that this figure shows a 2D view of the situation). The frustum is a set of
planes defining a volume being occluded. The property of this occlusion volume is that any point
lying behind all planes of this volume is inside of the volume, and thus is occluded. So in order
to define an occlusion volume, we just need a set of planes forming the occlusion volume.
Looking closer, we can see that the frustum is made of all of the occluder’s polygons facing the
viewer, and from new planes made of edges and the viewer’s position. So we will do the
following:
If you’ve gotten this far and it’s all working for you, there’s one useful optimization to
implement at this point. It lies in minimizing the number of facing planes (which will speed up
intersection detection). You may achieve this by collapsing all the facing planes into a single
plane, with a normal made of the weighted sum of all the facing planes. Each participating
normal is weighted by the area of its polygon. Finally, the length of the computed normal is
made unit-length. The d part of this plane is computed using the farthest contour point.
Occlusion testing will work well without this optimization, but implementing it will speed up
further computations without loss of accuracy.
Detecting Occlusion
To detect if an object is occluded, we will utilize the object’s bounding volume. To find out if the
object is inside of the occlusion frustum, we’ll make a simple test to check if its bounding
sphere is inside of the frustum. Figure 8 shows possi-ble situations that may arise. Here we see
that only sphere C passed the test and is fully occluded. Listing 3 shows the func-tion that may
be used for such a computation. This process detects whether a bounding sphere is occluded.
It’s fast, but not as accurate as detection using bounding boxes. With this technique, some
objects may be detected as visible after the bounding sphere test succeeds, but their bounding
box is still fully occluded, so we would end up rendering them even though they’re fully
occluded. Figure 9 illustrates this possibility.
Detecting if the bounding box is inside the occlusion frus-tum is another very simple task:
detect if all eight corner points of the bounding box are inside of the frustum (Listing 2). Note
that we use oriented bounding boxes in world coor-dinates, so we must transform local AABBs
to world OBBs (as explained previously). If any vertex is outside of the volume, the box is not
occluded. This test can take eight times more dot products than the sphere test, so it is less
efficient. Ideally you would use it only when you detect that the center of the bounding sphere
is inside the occlusion frustum but the sphere is still not occluded. This minimizes the chances
of wasting time checking box-versus-frustum collision, at the same time getting more accurate
occlusion tests, resulting in fewer objects being rendered.
Editing Support
Computing the occlusion volume and detecting whether an object is occluded is half of the work
that needs to be done. Another task is finding a way to edit occluders comfort-ably in an
evolving 3D scene during development. It may be done in several ways; I’ll discuss some of
them here.
Integrating an occluder editor into your game editor.This is a harder way to do it, but it’s
preferable over editing in a model-ing package. The geometry inside occluders may change,
and occluders must match geometry in order to be effective, so an “edit and see” approach is
the best bet here.
Because an occluder is simply a convex hull, once you’ve implemented and tuned your convex
hull code, you can call it with a set of points and you’ve got it.
Figures 10 shows occlusion techniques in a real game project, including numbers about
rendered triangles. Note that the terrain is specially modeled so that occlusion is efficient; it
con-tains high hills that are suitable for occluding objects behind them. But the areas of
possibility for occluder utilization are much wider: cities with plenty of buildings, fences (solid,
not transparent), and so on.
Finally, fine-tune editing support by visually displaying occluders in edit mode while designing
your missions or modi-fying the occluder’s shape in the editor.
Take It Outside
This article provides some insight into rendering possibilities for outdoor scenes, and shows
some new directions for optimizations. The model described here has worked — and continues
to work — for games I have been working on (used with other HSR techniques, such as sectors
and portals) and has undergone many changes to get to this point. As we contin-ue to improve
our methods for rendering all kinds of environ-ments in games, soon there will be no
boundaries as to what kinds of worlds we can provide players.
Excerpted from the Second Edition of Real-Time Rendering (AK Peters, 2002).
URL: http://www.gamasutra.com/features/20020715/mollerhaines_01.htm
Graphics accelerators are just beginning to include support for curved surfaces as basic
primitives. A problem with these early implementations is a lack of generality. For example, the
shadow volume method needs to use the silhouette edge of a model in order to generate
projected quadrilaterals. Computing such silhouette edges from the curved surfaces currently
has to be done on the software side. Even with such limitations, the potential quality and speed
improvements offered by curved surface descriptions makes them useful today, and future
graphics hardware promises to be more powerful and flexible.
The advantage of using curved surfaces is at least fourfold: (1) they have a more compact
representation than a set of polygons, (2) they provide scalable geometric primitives, (3) they
provide smoother and more continuous primitives than planar polygons, and (4) animation and
collision detection may become simpler and faster.
There are a number of advantages of a compact curve representation for real-time rendering.
First, there is a savings in memory for model storage (and so some gain in memory cache
efficiency). This is especially useful for game consoles, which typically have little memory
compared to a PC. Transforming curved surfaces generally involves fewer matrix multiplications
than transforming a mesh representing the surface. If the graphics hardware can accept such
curved surface descriptions directly, the amount of data the host CPU has to send to the
graphics hardware is usually much less than sending a polygon mesh.
An N-patch is a triangular Bézier surface, which can replace each triangle in a triangle mesh. N-
patches have the interesting property that a model with few polygons can be made more
convincing and realistic. The individual polygons are treated as curved surfaces, so creating
more vertices on the surface. The result of a higher vertex density is better lighting of the
surface and more realistic silhouette edges.
This article starts with a brief description of Bézier triangles. These form the basis for the N-
patches, which are discussed in more detail in the rest of the article.
Bézier Triangles
Even though the triangle often is considered a simpler geometric primitive than the rectangle,
this is not the case when it comes to Bézier surfaces: Bézier triangles are not as straightforward
as Bézier patches. However, because Bézier triangles are used in forming N-patches, we first
discuss these.
The control points are located in a triangular grid, as shown in Figure 1.
The degree of the Bézier triangle is n, and this implies that there are n+1 control points per
side. These control points are denoted p0i,j,k and sometimes abbreviated to pi,j,k. Note that
i+j+k=n, and i,j,k >= 0 for all control points. Thus, the total number of control points is
It should come as no surprise that Bézier triangles also are based on repeated interpolation.
However, due to the triangular shape of the domain, barycentric coordinates must be used for
the interpolation. Recall that a point within a triangle ∆p0p1p2, can be described as
where (u,v) are the barycentric coordinates. For points inside the triangle the following must
hold: u>=0, v>=0, and 1-u+v>=0 (alternately, u+v<=1). Based on this, the de Casteljau
algorithm for Bézier triangles is:
de Casteljau [triangles]:
The final point on the Bézier triangle at (u,v) is pn000(u,v). The Bézier triangle in Bernstein form
is:
Bernstein [triangles]:
The Bernstein polynomials now depend on both u and v, and are therefore computed
differently, as shown below:
When any of the following is true: i,j,k<0 and i,j,k>n, the Bernstein polynomial is set to zero;
Bnijk(u,v)=0. The partial derivatives are [1]:
Derivatives [triangles]:
Some unsurprising properties of Bézier triangles are that they interpolate (pass through) the
three corner control points, and that each boundary is a Bézier curve described by the control
points on that boundary. Also, the surfaces lies in the convex hull of the control points. Finally,
rotating the control points and then generating points on the Bézier triangle is the same as
generating points on the Bézier triangle and then rotating these. A Bézier triangle is shown in
Figure 2.
N-Patches
Given an input triangle mesh with normals at each vertex, the goal of the N-patches scheme by
Vlachos et al. [2] is to construct a better looking surface on a triangle basis. The term "N-
patches" is short for "Normal-Patches," and these patches are also called PN triangles. This
scheme attempts to improve the triangle mesh's shading and silhouette by creating a curved
surface to replace each triangle. Hardware is able to make each surface on the fly because the
tessellation is generated from each triangle's points and normals, with no neighbor information
needed. API changes are minimal; all that is needed is a flag telling whether to generate N-
patches, and a level of tessellation. See Figure 3 for an example. The algorithm presented here
builds upon work by van Overveld and Wyvill [3].
Assume we have a triangle with vertices p300, p030, and p003 with normals n200, n020, and n002.
The basic idea is to use this information to create a cubic Bézier triangle for each original
triangle, and generate as many triangles as we wish from the Bézier triangle.
To shorten notation, w = 1-u-v will be used. A cubic Bézier triangle (see Figure 1) is given by:
To ensure C0 continuity at the borders between two N-patch triangles, the control points on the
edge can be determined from the corner control points and the normals at the respective
control point (assuming that normals are shared between adjacent triangles). Also, to get
reasonable behavior of the surface at the control points, the normals there should be normals of
the surface in the equation above. Therefore, the following strategy is adopted to compute the
six different control points for the borders. Say that we want to compute p210 using the control
points p300, p030, and the normal n200 at p300. Simply take the point 2/3 p300 + 1/3 p030 and
project it in the direction of the normal, n200, onto the tangent plane defined by p300 and n200
[2][5][1]. See Figure 4.
The other border control points can be computed similarly, so it only remains to compute the
interior control point, p111. This is done as shown in the equation that follows, and this choice
follows a quadratic polynomial [5][1]:
Instead of using the previous Bézier triangle derivatives equation to compute the two tangents
on the surface, and subsequently the normal, Vlachos et al. [2] choose to interpolate the
normal using a quadratic scheme as shown below:
This can be thought of as a Bézier triangle of degree two, where the control points are six
different normals. In the equation above, the choice of the degree, i.e., quadratic, is quite
natural since the derivatives are of one degree lower than the actual Bézier triangle, and
because linear interpolation of the normals cannot describe an inflection. See Figure 5.
To be able to use the previous equation, the normal control points n110, n101, and n011, need to
be computed. One intuitive, but flawed, solution is to use the average of n200 and n020 (normals
at the vertices of the original triangle) to compute n110. However, when n200=n020, then the
problem shown at the lower left in Figure 5 will once again be encountered. Instead, n110 is
constructed by taking the average of n200 and n020. Then this normal is reflected in the plane π,
which is shown in Figure 6. This plane has a normal parallel to the difference between the
endpoints; p300 and n030. The plane π is passing through the origin since direction vectors are
reflected, and these are independent on the position on the plane. Also, note that each normal
should be normalized.
van Overveld and Wyvill originally used a factor 3/2 instead of the 2 in the equation above.
Which value is best is hard to judge from looking at images, but using 2 gives the nice
interpretation of a true reflection in the plane. Lee and Jen analyze artifacts involved in normal
interpolation, and suggest solutions [4].
At this point, all Bézier points of the cubic Bézier triangle and all the normal vectors for
quadratic interpolation have been computed. It only remains to create triangles on the Bézier
triangle so these can be rendered. Advantages of this approach are that the surface gets a
better silhouette and shape relatively cheaply, and that only minor modifications must be made
to existing code to make this work. All that is needed is that tessellation should be done
(instead of rendering as usual), down to some Level of Detail (LOD). A hardware
implementation is pretty straightforward.
One way to specify LODs is the following. The original triangle data is LOD 0. Then the LOD
number increases with the number of newly introduced vertices on a triangle edge. So LOD 1
introduces one new vertex per edge, and so creates four subtriangles on the Bézier triangle,
and LOD 2 introduces two new vertices per edge, generating nine triangles. In general, LOD n
generates (n+1)² triangles. To prevent cracking between Bézier triangles, each triangle in the
mesh must be tessellated with the same LOD. This is a big disadvantage since a tiny triangle
will be tessellated as much as a large triangle. Adaptive tessellation and fractional tessellation
are possible, but not yet supported. Creases are hard to control, and often one needs to insert
extra triangles near the desired crease. The continuity between Bézier triangles is only C0, but
still it looks acceptable in many cases. This is mainly because the normals are continuous across
triangles, so that a set of N-patches mimic a G1 surface. Note that to get good looking
texturing, C1 continuity is required across borders between triangles (or patches). Also worth
knowing is that cracks will appear if two adjacent triangles do not share the same normals.
N-patches are supported by the DirectX 8 API and through extensions in OpenGL. Version 8.0 of
DirectX has support for the interpolation of normals, but only with linear interpolation. Version
8.1 also allows quadratic interpolation of normals. There is a performance cost in normal
interpolation; quadratic interpolation is more expensive than linear. Besides the standard N-
patch interpolation (cubic Bézier triangles), version 8.1 also allows linear interpolation of vertex
positions. This means that a triangle is tessellated with many smaller coplanar triangles with
interpolated normals across each. ATI accelerates N-patches in hardware, which they call
TRUFORM, beginning with their 8000 series of chips. N-patches are also used in the
displacement mapping primitive proposed by Matrox.
Acknowledgements
We would like to thank Alex Vlachos for his help consulting on parts of this article.
References
[1] Farin, Gerald, Curves and Surfaces for Computer Aided Geometric Design--A Practical
Guide, Fourth Edition (First Edition, 1988), Academic Press Inc., 1996.
[2] Vlachos, Alex, Jörg Peters, Chas Boyd, and Jason L. Mitchell, "Curved PN Triangles," ACM
Symposium on Interactive 3D Graphics 2001, pp. 159-166, 2001.
http://alex.vlachos.com/graphics/CurvedPNTriangles.pdf
[3] van Overveld, C.V.A.M., and B. Wyvill, "An Algorithm for Polygon Subdivision Based on
Vertex Normals," Computer Graphics International '97, pp. 3-12, June 1997.
[4] Leed, Yuan-Chung, and Chein-Wei Jen, "Improved Quadratic Normal Vector Interpolation for
Realistic Shading," The Visual Computer, vol. 17, no. 6, pp. 337-352, 2001.
[5] Farin, Gerald, "Triangular Bernstein-Bézier Patches," Computer Aided Geometric Design,
vol. 3, no. 2, pp. 83-127, 1986.
[6] van Overveld, C.V.A.M., and B. Wyvill, "Phong Normal Interpolation Revisited," ACM
Transaction on Graphics, vol. 16, no. 4, pp. 379-419, October 1997.
By
By Sean O'Neil
Gamasutra
July 12, 2002
URL: http://www.gamasutra.com/features/20020712/oneil_01.htm
In parts one and two of this series, I explained how to dynamically generate and render
planetary bodies at real-time speeds using a function based on fractal Brownian motion (fBm)
paired with a spherical ROAM algorithm. This article will concentrate on how to scale that up to
a star system or even an entire galaxy. It will also discuss some of the problems you will run
into with scale and frame of reference, and different ways to solve them.
The main problem with trying to model and render a really large game world is precision. A 32-
bit float has a maximum of 6 significant digits of accuracy, and a 64-bit double has a maximum
of 15. To put this into the correct frame of reference, if the smallest unit you care about
keeping track of is a millimeter, you start to lose accuracy around 1,000 km with floats and
around 1 trillion km with doubles.
Given the fact that the Earth's radius is close to 6,378 km, a 32-bit float isn't even enough to
model and render one Earth-sized planet accurately. But losing precision at the millimeter, and
possibly centimeter level, with the vertices in a planet's model is not a significant concern. You
will run into a number of much bigger problems trying to model and render such a large game
world. One possible solution is to use 64-bit doubles everywhere, but this is a slow and rather
clumsy way to solve these problems.
When I first rendered my planet centered at the origin of my 3D map, I noticed two problems
right away. The first was that placing the far clipping plane out at a decent distance made my
Z-buffer useless. The second problem was that at a certain distance, the planet would disappear
regardless of what I set the far clipping plane to. The second problem seemed to be driver or
card-specific because each video card I tested it on ran into the problem at different distances.
Both problems had something to do with very large numbers being used in the transformation
matrices.
I solved both of these problems by scaling down the size and the distance of planetary bodies
by modifying the model matrix. Using a defined constant for the desired far clipping plane,
which I'll call FCP for now, I exponentially scale down the distance so that everything past
FCP/2 (out to infinity) is scaled down to fall between FCP/2 and FCP. To make the size of the
planetary body appear accurate, all you have to do is scale the size by the same factor you
scale the distance. Once the routine was written, I just brought the far clipping plane in until
the Z-buffer precision seemed to be sufficient. Because distances are scaled exponentially, the
proper Z order is maintained in the Z-buffer.
Next I tried placing a star at the center of the 3D map and placing the planet and camera out to
Earth's orbital distance in the X direction. I immediately ran into rendering problems and
positioning problems, though it was hard to tell that it was multiple problems until I fixed the
rendering problems. The rendering problems caused all objects in the scene to shake and
occasionally disappear whenever the camera moved or turned. Again the rendering problems
showed up differently on each video card I tested it on, and again they had something to do
with very large numbers being used in the transformation matrices.
Perhaps the most common way to use OpenGL's model/view matrix is to push the camera's
view matrix onto the stack and multiply it by each object's model matrix during rendering. The
problem with the traditional model and view matrices in the test case outlined above is that
both have a very large translation in the X direction. A 32-bit float starts to lose precision
around 1000 km, and Earth's orbit is around 149,600,000 km. Even though the camera is close
to the planet and the numbers should cancel each other out, too much precision is lost during
the calculations to make the resulting matrix accurate.
Is it time to resort to doubles yet? Not yet. This problem can be fixed very easily without using
doubles by changing how the model and view matrices are calculated. Start out by pretending
the camera is at the origin when you calculate your view matrix. If you use an existing function
like gluLookAt() to generate your view matrix, just pass it (0, 0, 0) for the eye position and
adjust your center position. Then calculate each model matrix relative to the camera's actual
position by subtracting the camera's position from the model's position. The result is two
matrices with very small numbers when the camera is close to the model, which makes the
problem go away completely. A precision problem still exists with objects at a great distance
from the camera, but at that distance the precision loss isn't noticeable.
After all rendering problems have been fixed, you run into precision problems with object
positions. Using floats, you can't model positions accurately once you get out past 1,000 km
from the origin. The most obvious symptom appears when you try to move the camera (or any
other object) when it's far away from the origin. When a position contains really large numbers,
a relatively small velocity will be completely dropped as a rounding error. Sometimes it will be
dropped in 1 axis, sometimes in 2, and sometimes in all 3. When the velocity gets high enough
along a specific axis, the position will start to "jump" in noticeably discrete amounts along that
axis. The end result is that both the direction and magnitude of your velocity vector end up
being ignored to a certain extent.
Is it time to resort to doubles yet? Yes. I don't think there's any way around it with object
position. There's no number magic you can work that will give you extra digits of precision
without cost. TANSTAAFL. Luckily, you only need doubles for object positions. Everything else
can still be represented with floats, and almost every math operation you perform will still be a
single-precision operation. The only time you need double-precision operations is when you're
updating an object's position or comparing the positions of two objects. And with 15 digits of
precision, you get better precision way out at 1000 times Pluto's orbit than you get with floats
dealing with one planet at the origin.
This is a tough one. A double may get you safely out to 1000 times Pluto's orbit, which is just
under 2/3 of a light-year, but you really can't take it much farther. Since we don't currently
have any built-in data types larger than a double, you have to resort to something custom. I've
seen a number of implementations that will work here, but something fast is needed. I've seen
custom 128-bit fixed-point numbers created using 2 __int64 values. I've seen 4-bit BCD (Binary
Coded Decimal) routines used to achieve unlimited precision. I'm sure if you looked you could
even find 128-bit floating-point emulation routines out there.
A common problem with all the schemes I've mentioned so far is performance. Generally
speaking, software is much slower than hardware. This means that if you're not using a native
data type, all of these custom routines will run much more slowly than double-precision
operations. I prefer to solve this problem by using different frames of reference at different
scales. The top level would be the galaxy level, with the galaxy centered at the origin and with
1 unit being equal to 1 light-year. The next level would be the star system level, with the star
centered at the origin and 1 unit being equal to 1 kilometer.
Because the distance between stars is so vast, you really don't need to mix the two frames of
reference. If you consider the fact that anything traveling between stars at sub-light speeds
would never get there during the player's lifetime, then you can choose your frame of reference
based on whether an object is traveling above or below the speed of light. When an object
jumps to FTL (Faster Than Light) travel, you can immediately switch to the galaxy frame of
reference. When an object drops back to sub-light speed, you can immediately switch to the
star system frame of reference. If a star is nearby, you can choose that to be the new origin.
Otherwise, you can make the object's initial position the origin. It is also possible to keep the
player from stopping between star systems by forcing them to select a destination star system,
then leading the camera there any way you want.
I've already explained how to expand your Z-Buffer's precision by scaling planets down both by
distance and by size. However, this won't solve all your Z-Buffer problems. You'll find that in
scaling large numbers down, small changes in distance are lost. A moon that should be behind
a planet may be rendered in front of it, or vice versa. Even when you're close to a planet, you
can still have triangles break through the Z-Buffer because the far clipping plane is to far away.
Bringing it in farther just makes those other problems worse.
Since you can't change the hardware in your video card, and since it's unlikely that chipset
designers will provide a 64-bit Z-Buffer, this is a tough problem to solve. I have found that
using impostor rendering makes this problem much easier to manage. To create an impostor,
you render an object all by itself to the back buffer (or to a separate pixel buffer), then copy (or
bind) that object into a texture map. Then the object is rendered as a single rectangle using the
generated texture map. Because most objects look different from different viewpoints, the
texture map must be updated as the camera moves around. In essence, an impostor is just a
billboard that is generated in real-time as needed.
At this point many of you will be wondering how this helps you with Z-Buffer precision. The
answer lies in the fact that you must use a special projection matrix when rendering your
impostor texture, and this projection matrix has its own front and back clipping planes. This
projection matrix creates a view frustum that fits tightly around your object on all 6 sides,
which gives you the best Z-Buffer precision you can have for that object alone. Once that
texture map is generated for a planet, you really don't need to worry about Z-Buffer precision.
Because impostor texture maps are partially transparent, you need to render them in reverse Z
order. This means that you can turn the Z-Buffer off completely when drawing the impostored
planets.
Impostors not only improve Z-Buffer precision for the objects being rendered as impostors,
they also offer great performance improvements. Instead of rendering every planet in your
scene every frame, you only need to render two triangles per planet on most frames. Every
now and then you will have to update an impostor texture, but even then you will rarely need
to update more than one planet's impostor in any given frame, and most frames won't need
anything updated at all. Since you're not rendering all those triangles, you also don't need to
check their ROAM priorities or update their triangle meshes.
Unfortunately, nothing comes without a price. There are a few problems that can crop up when
using impostors. The first problem is that differences between the resolution of the impostor
and the screen resolution can cause aliasing artifacts. This can be minimized by choosing an
appropriate texture size based on the amount of view angle or screen space taken up by the
object. Changing the texture resolution for an object will cause a visual shift that looks like the
video card switching between mip-map levels. Because you can control the resolution/size
trade-off, this problem is usually acceptable. However, this problem can become really bad
when the camera gets so close that the object's edges extend beyond the edges of the screen.
Impostors can also cause problems related to the Z-Buffer. Because you're taking a 3D object
and rendering it as a 2D rectangle, you are changing the Z-Buffer values that the object would
normally generate. Worse yet, you are changing the Z-Buffer values for the entire rectangle,
including the transparent portions of it. This can cause some really bad problems when an
impostor gets too close to other objects. The rectangle can end up hiding objects that should be
visible. It can even chop objects in half that lie in the rectangle's plane.
Luckily, these problems aren't too bad when dealing with planets. Inter-planetary distances are
so large that we really only have to worry about these problems when the camera gets close to
a planet. Even then we don't have to worry about objects being drawn on the surface of the
planet, or in the planet's atmosphere, as these can all be rendered directly into the impostor if
necessary. Still, at some point using the impostor will become more trouble than it's worth. The
easiest way to deal with this is to switch from impostor rendering to normal rendering when the
camera gets within a certain distance of the planet. In my demo I switch to normal rendering
when a planet takes up 90 degrees or more of the field of view.
I have one last thing to mention about impostors. They are also used for rendering clouds,
forests, cities, and several other large-scale details that you might want to render on a planet.
But care must be taken in how you manage them, or you will quickly find yourself out of video
memory. In my demo, I choose impostor resolutions from 512x512 all the way down to 8x8
based on the planet's distance to the camera. If I start to use impostors more extensively, I will
need to create a texture cache for them.
This demo is getting pretty large, and I don't have time to explain everything I've added to it
since the last article. Since the main addition I've made since the last article is impostors, I'll
try to explain that piece as well as I can.
CImpostor
Let's start by analyzing the members of the CImpostor class. If you haven't looked at the
source code from my previous articles, C3DObject contains position and orientation information
about the object itself. It also contains a bounding radius for the object, which I recently added
for a number of things like view frustum culling, collision detection, and impostor rendering. For
impostors it is used to determine how much screen space the object is taking up, which is then
used to determine the resolution of the billboard texture as well as the projection matrix for
rendering into that texture. A bounding box or convex hull may easily be used instead of a
bounding radius, but for planets a bounding radius gives the best fit.
// Helper methods
void GetImpostorViewMatrix(C3DObject *pCamera, CMatrix &mView);
float GetImpostorScreenSpace(float fDistance);
short GetImpostorResolution(float fScreenSpace);
};
Overall the concept is not too complex, but the math to fit the billboard tightly around the
object and to generate the right projection matrix is not trivial. Keep in mind that we need the
projection matrix to respect perspective projection or it will not look right, especially when
switching between impostor rendering and normal rendering. Here is some pseudo-code with
comments for InitImpostorRender(), which should explain the bulk of the math.
Though the bounding sphere requires slightly more complicated geometry, it actually requires
less code and is more efficient. Because the view frustum and billboard are perfect squares,
they fit nicely into standard square textures. Bounding boxes will have a rectangular view
frustum and billboard, and to fit it into a texture efficiently (i.e. without a lot of unused space),
the up vector chosen can be much more important. I would not really recommend using
bounding boxes because even though they can use the texture more efficiently, they can only
help in one of two dimensions, and most textures are square anyway. If you plan to use the
rectangular texture extension to try to optimize video memory usage, it may be worth playing
around with.
CPBuffer
This class is just a simple wrapper class around the WGL_ARB_pbuffer extension. The code is
straightforward enough that there's no need to explain it in the article, but some of the
concepts behind it need to be explained. First of all, a pbuffer is an off-screen video buffer that
you can create with its own rendering context. It's like having an extra back buffer to render to,
but it can be any size or pixel format you want. It is not tied to the size and format of the
rendering window. When you're finished rendering to this pbuffer, you can copy its contents
into a texture object. If your video card supports the WGL_ARB_render_texture extension, you
can even bind a texture object directly to the pbuffer (which saves you the overhead of copying
it).
Not all video cards support the WGL_ARB_pbuffer extension, and even fewer seem to support
the WGL_ARB_render_texture extension. Luckily, it is easy enough to organize your code to use
the back buffer when these extensions are not supported. In fact, my demo is written to work
this way. Most current video cards should support copying from the back buffer to a texture.
Unfortunately, rendering to the back buffer is usually slower and less convenient than using a
pbuffer. This is because you have to call glClear() before you render each impostor. The
glClear() call is slower on a large back buffer than on a small pbuffer, and it's more
convenient to be able to update an impostor after you've started rendering to the back buffer
(which you can't do if you need to clear it for an impostor).
There is one more requirement that might trip you up. Because an impostor texture needs to be
partially transparent, your video card must support "destination alpha". This means that it must
be able to store an alpha channel in the back buffer (or pbuffer) that you will be rendering into.
Without that alpha channel, copying your buffer into a texture with an alpha channel will set all
alpha values to 1.0, making your texture completely opaque. When rendering impostor
textures, you must also make sure that the alpha component of the clear color is 0.
It is possible to store many small textures inside one large one. Many games merge all their
smaller textures into one large one to avoid unnecessary context switching in the video card.
For impostors, you could do the same in one of two ways. First, you could render impostors into
pbuffers of a certain size, then copy the pbuffer to a specific location of a larger texture.
Second, you could create a large pbuffer and use glViewport() to render into a small part of
the pbuffer. The second idea might allow you to use WGL_ARB_render_texture efficiently. One
possible drawback is that glClear() seems to run slower if you're not clearing the full buffer,
and it may actually be quicker to copy than to bind for smaller impostor textures. I suppose the
only way to find out for sure is to take the time to test it.
Final Notes
We now have a demo capable of rendering an entire solar system fairly efficiently. Planets that
aren't close to the camera aren't updated or fully rendered every frame, and inter-planetary
distances should not cause any scale or depth problems. Believe it or not, the only depth
problem you may have now is when you get too close to a planet to render it as an impostor,
but the horizon is still pretty far away. You can alleviate this problem by changing the distance
at which you switch to normal rendering, changing the function that scales distance and size,
and possibly by dynamically moving the near and far clipping planes based on the horizon
distance. The last option will give the best Z-Buffer precision when the camera is close to the
ground, which is when you are likely to need it most.
I've also added some classes to organize planets and moons into a solar system, along with
code to load the solar system from an INI file. The planets don't move at this point, but it
shouldn't be too hard to add orbits and motion. The demo project comes with an INI file
defining 4 planets with a single moon orbiting the third planet. For now I divide orbital distances
by 10 when loading the planets to make them easier to see from each other.
At this time, I don't have any plans to write another article for a while. If you have any
questions, comments, or ideas, feel free to drop me an email (see the Author's Bio link above).
If you're working on something similar, or using my code in a project of your own, I'd also be
interested in hearing from you.
By
By Scott Johnson
Gamasutra
May 17, 2002
URL: http://www.gamasutra.com/features/20020510/johnson_01.htm
Matrix transforms are a ubiquitous aspect of 3D game programming. Yet it is surprising that
game programmers do not often use a rigorous method for creating them or a common way of
discussing them. Practitioners in the field of Robotics have mastered them long ago but these
methods haven't made their way into daily practice among game programmers. Some of the
many symptoms include models that import the wrong way and characters that rotate left when
they are told to rotate right. So after a review of matrix conventions and notation, we'll
introduce a useful naming scheme, a shorthand notation for transforms and tips for debugging
them that will allow you to create concatenated matrix transforms correctly in much shorter
time.
Matrices represent transforms by storing the vectors that represent one reference frame in
another reference frame. Figure 1 shows two 2D reference frames offset by a vector T and
rotated relative to each. To represent frame one in the space of frame zero , we need the
translation vector T, and the unit axis vectors X1 and Y1 expressed in the zero frame.
We know that we need to store vectors in a matrix but now we have to decide how. We can
either store them in a square matrix as rows or as columns. Each convention is shown below
with the vectors expanded into their x and y components.
Each stores the same information so the question of which one is better will not be discussed.
The difference only matters when you use them in a multiplication. Matrix multiplication is a set
of dot products between the rows of the left matrix and columns of the right matrix. Figure 4
below shows the multiplication of two 3x3 matrices, A and B.
The first element in the product A times B is the row (a00, a01, a02) dotted with the column
(b00, b10, b20). The dot product is valid because the row and the column each have three
components. This dictates how a row vector and a column vector are each multiplied by a
matrix.
In each convention, the vectors are represented consistently as rows or columns as one might
expect but it is important to realize that the order changes. Again, we must switch the order
because the rows on the left must be the same size as the columns on the right in order for the
matrix multiplication to be defined.
You can convert between row and column matrices by taking the matrix transpose of either
matrix. Here we show that the transpose of a column matrix is a row matrix.
In the first section we defined a matrix transform (figures 2 and 3) from reference frame 1 to
reference frame 0 by expressing the vectors of frame 1 in frame 0. Let's name it M1to0 to make
the reference frames it transforms between explicit. When we start to introduce new reference
frames, as in figure 5, this name will be very handy.
Figure 5: Introducing a third reference frame (2) and a point P2 in that frame.
These frames could represent the successive joints of a robot arm or an animation skeleton.
Suppose the problem is to find P2 in the space of the zero frame. We'll call this point P0. We
can now write out the answer to this problem using our naming scheme for matrices, keeping in
mind the order of multiplication between row vectors and matrices and column vectors and
matrices.
Column Convention
P1 = M2to1 * P2
P0 = M1to0 * P1
P0 = M1to0 * M2to1 * P2
We have been consistent with the way column vectors are multiplied with matrices by keeping
the column vectors to the right of the transform matrices.
Row Convention
P1 = P2 * M2to1
P0 = P1 * M1to0
P0 = P2 * M2to1 * M1to0
We have been consistent with the way row vectors are multiplied with matrices by keeping the
row vectors to the left of the transform matrices.
So the problem has been reduced to finding the transform matrices, and already we have
accomplished a lot. We established a convention for naming points in space by the reference
frame that they are in (P0, P1). We named matrices for the reference frames that they
transform between (M1to0, M2to1). And finally, we leveraged the naming scheme to write out a
mathematical expression for the correct answer. There is no ambiguity regarding the order of
the matrices or which matrices we need to find.
Figure 6 shows the translation vectors between the frames. With the new information in the
figure, we can plug into the matrices from figures 1 and 2 to get the needed transform
matrices.
Column Convention
P0 = M1to0 * M2to1 * P2
Row Convention
P0 = P2 * M2to1 * M1to0
Thus we have solved the problem of finding point P0 given P2.
If we reversed the problem and needed to find point P2 given point P0 we could solve it using
the same method. We would quickly find that we need the matrices M0to1 and M1to2 and we
can get them using matrix inversion.
M0to1 = (M1to0)-1
M1to2 = (M2to1)-1
Again, we write the equation for P2 given P0, M1to0, and M2to1 by allowing the naming scheme
to guide the order of the matrix concatenation.
Column Convention
P2 = M1to2 * M0to1 * P0
P2 = (M2to1)-1 * (M1to0)-1 * P0
Row Convention
P2 = P0 * M0to1 * M1to2
P2 = P0 * (M1to0)-1 * (M2to1)-1
Another way to write those equations is by multiplying the matrices first. Matrix multiplication is
not communative (meaning you can't switch the order of the factors) but it is associative
(meaning you can regroup the factors with parentheses). We can take the row equation:
P2 = P0 * M0to1 * M1to2
And group the matrices together to illustrate the naming scheme for concatenated matrices.
P2 = P0 * (M0to1 * M1to2)
P2 = P0 * M0to2
So when multiplying matrices together using this naming scheme you just chain the reference
frame names together.
These matrix derivations make excellent comments in the code that can save the person who
reads your code lots of time.
The multiplication of two 4x4 matrices is even bigger. It is already a large bulky expression with
just two matrices. No one ever gained any insight into matrix concatenation of transform
matrices by looking at the product expressed by each component. Instead we'll substitute
algebraic variables for the sections of a transform in order to come up with a much more
intuitive notation.
The upper left 3x3 portion is a rotation and the far right column forms the translation. Let's
simplify the matrix by making some definitions.
It is easy enough to do by hand. It is just four dot products between the rows on the left and
the columns on the right. In the coming notation, many of the multiplications will be with one or
zero so that will make it even easier.
Up to this point, we haven't dealt with scale but it is easy enough to add.
This new notation allows us to study the effects of combining rotation, translation, and scale by
combining building blocks for each one. Figure 7 defines a 2x2 rotation matrix that is really a
representation of a 4x4 transform matrix. Likewise, Figure 8 defines a 2x2 scale matrix that
represents a 4x4 transform matrix.
This notation is not concerned with whether R has rows or columns in it so the R matrix (Figure
7) is the same in both row and column conventions. S is a diagonal matrix so its 2x2 matrix
(Figure 8) is the same in both row and column conventions. The 2x2 matrix for translation must
change based on the row/column convention to reflect the location of the translation in the full
4x4 transform.
Now we have the building blocks and we can start combining them. Let's start with a simple
translation and rotation, change the order of multiplication and see what we can learn from it.
Column Convention
With translation on the left and rotation on the right we get the familiar M1to0 matrix,
represented as a 2x2.
Switching the factors yields an entirely different result. The rotation, R, is the same, but the
translation portion of the right hand side shows that R has rotated the translation.
Row Convention
In order to get the familiar M1to0 row matrix, we need to put rotation on the left of the
translation.
Now that the differences in the notation between row and column conventions have been
shown, we'll only show the column convention to avoid repeating the same point.
The column transform for figure 6 is shown below. The change is that we have to distinguish
between the different rotations and translations by naming them differently with subscripts.
Now we experiment with scale. If we tack a scale matrix factor on the right of the product we
get:
Right away you can see that the scale does not affect the translation (upper right portion of the
product) at all because S doesn't appear in it. This makes sense because with columns, the full
transform equation with points P0 and P5 included would look like this,
and it is just as though P5 was scaled and then the rest of the transform occurred afterwards.
The given point was named P5 because each matrix is considered a transform from one space
to another.
then every term in the result is scaled, as you might expect. There are countless combinations
to explore. The notation makes it easier to form a complex transform from intuitive simple
pieces.
It is easy to multiply 2x2 matrices by hand but it gets very tedious to repeat. Instead, you can
enter any of the above symbolic expressions into Mathematica, MathCad, or Maple V and the
product is computed for you. Math programs take some effort to learn but your investment will
be paid back many times over.
Interpreting Concatenated Matrix Transforms
Transforms are described in steps made up of translations, scales, and rotations. There is
sometimes confusion though about which step is first. The problem is that there are two valid
ways of interpreting a transform. You can think of a transform as progressing from right to left
with a point, P, being transformed from distant reference frames towards the zero frame. One
might describe the following matrix transform as "P4 is rotated by R2, translated by T2, rotated
by R1 and then translated by T1."
One can also describe the transform as a series of changes applied from left to right. Each
change is applied to a reference frame. It would then be described as, "Starting with the zero
frame, the axes are translated by T1, rotated by R1, translated by T2 and then finally rotated
by R2." The former description mentions a rotation by R2 as the first step. The latter description
mentions a translation by T1 as coming first so it can be confusing.
The right to left interpretation is obviously valid because you just start at the right and multiply
your column vector by the right most matrix. At each step you get a column vector in another
reference frame. The other interpretation is valid because you can imagine combining matrices
from left to right. After each multiplication, you have a product matrix that can be partitioned
into axis vectors and a translation, just like in Figure 2.
If you run into a discrepancy with someone about the way to read a matrix, write it out and
discuss the pieces of the transform. The matrix math is the same regardless of the way it is
read. You might each be talking about the same matrix but in two different ways.
C++ has been so widely accepted by game developers that by now everyone that wants a
matrix class already has one. Chances are your thoughts on whether row or column matrices
are better are irrelevant because the company's (or team's) matrix class already exists and you
have to use it. The task now is to make sure that you learn the company's matrix conventions.
This includes the way the matrix elements are stored, and the decision to form row or column
matrices. You could ask another developer or you could take a look at the way a matrix is
multiplied with a vector in the matrix class implementation. Look at the dot product performed
to reach the first element in the matrix product. If the vector is dotted with the top row of the
matrix, the vector is a column. If the vector is dotted with the left most column, then the vector
is a row. Next do a sanity check with some other functions. For instance, if there is a function
that converts a quaternion to a matrix, check that it is following the same convention. Look up
the conversion in a reference and check that the reference author agrees with the author of
your class. After you are sure of the class conventions you won't ever have to question what
they are again.
There is a bad but accepted method of creating matrix transforms amongst many game
programmers that goes like this. Make an initial guess of what the transform expression might
be and type it in. Try it out and see if it works. If it doesn't work, transpose and swap matrices
in the expression until it works. This is exactly what not to do.
Instead, you should write out the expression for your matrix transform and know that it is right.
You know it is right because you know your matrix conventions, and you used the above matrix
naming scheme to create the expression. Of course there will be times when you have the
correct expression but it doesn't work when you try it in code. When that happens you have to
check that the matrices you created actually match their names and you have to check the
matrices that were passed in from other sources as well. It can still be difficult but at least you
will be progressing towards the right answer by isolating the problem.
The reason it is so important not to mechanically transpose or swap your matrices is that it is
easy to get lost in all the possible transposes. We've seen that the difference between row and
column matrices is a transpose. Unscaled rotation matrices have the property that their inverse
is their transpose. So if you blindly invert a matrix you can be introducing a transpose. With
enough swapping and transposing, you can get back to where you started because of the
matrix identity:
It is easy to get lost in all the transposes after only a few hacks. Another difficulty is that two
transposes undo each other.
The iterative hacking of the matrix expression is supposed to stop when the result looks right
but you may have two errors. This is why mysterious transposes live on in some code bases.
After a while it would require a time consuming rigorous audit of too much code to fix. The best
way to avoid those situations is to make the matrices correctly the first time.
Conclusion
We've covered several helpful ways to make creating transforms easier. Name vectors and
points with the reference frame they are in. Name matrices by the reference frames that they
transform between. Use the matrix names to guide the way they can be combined. Use the
simplified 2x2 versions of the transforms to visualize and plan out your desired transform. And
lastly, don't ever hack your transforms by swapping matrices or transposing them. If you follow
these rules and get your fellow programmers to follow these rules, working with transforms
becomes much easier.
The naming schemes, matrix concatenation, and the 2x2 transform notation was all covered in
Prof. Lynn Conway's undergraduate Robotics course at the University of Michigan. Our course
text had good coverage in a more rigorous manner:
Robotics for Engineers, Yoram Koren, McGraw-Hill, 1985., pp. 88-101. Unfortunately this book
is out of print. Amazon occasionally has a used one.
URL: http://www.gamasutra.com/features/20020424/evans_01.htm
This article starts with a theoretical case for what we call activity-orientated structure, arguing
that activities are an important part of human life, and one that will become increasingly
important in future simulations including human-like agents in believable contexts. Second,
since there are so many different activities, it is important they can be added easily to the
system and realized in the game world as actual (albeit non-physical) objects, rather than (say)
abstractions in the minds of agents.
Implementational issues are also explored, focusing on the way activities should be added to
the system. We argue that activities should be defined in a high-level language, which is
compiled into C++. An example piece of activity-content, defined in this high-level language, is
presented.
z Social activities are part of human life, and thus are usefully 'included' (in some sense) in
simulations of human-like agents.
z Social activities are an important part of human life; thus their inclusion in simulations is
correspondingly important to the believability of the simulation
z Social activities are the most important part of human life, that which distinguishes us
from mere brutes. Thus, without addressing the activity concept in our simulations, we
will be limited to simulating brutes.
Social activities are part of human life - agents who do not understand them can
appear dumb. This minimal claim is most easily illustrated by considering a series of examples
where an agent's lack of 'understanding' of activities and their relations can be cited as a failing
of that agent.
z A chess computer. Although it is very good at chess, a chess computer is blind to the
world around. It will never get bored of playing chess, and want to play something else. It
does not understand, in other words, that playing chess is merely one activity amongst
many, which satisfies some desires but not others. The chess computer doesn't
understand the place of chess within the social flux - it doesn't understand that chess is a
game played for recreation or competition.
z A problem with Black & White. In this game, a creature may be making friends with
another, when he decides to interrupt this to go toilet. What is stupid about this
behaviour is that the agent had no understanding of the consequences (for the activity of
making friends) of stopping in mid-conversation to relieve himself. The same problem can
arise in The Sims (a very entertaining piece of software): Peter's character was chatting
up a lady, but then got tired, and in the middle of his chat he went off to have a bath.
(The fact that this happens in both programs shows it is a moderately deep problem, and
not a consequence of one particular implementation).
z Agents' understanding of Ownership. Nowadays, many games include agents owning
objects. But this "ownership" is implemented in the simplest possible way. The following
comments apply to SHRDLU, but apply directly to modern computer games. "SHRDLU
cannot be said to understand the meaning of "own" in any but a sophistic sense.
SHRDLU's test of whether something is owned is simply whether it is tagged "owned".
There is no intensional test of ownership, hence SHRDLU knows what it owns, but doesn't
understand what it is to own something. SHRDLU would understand what it meant to own
a box if it could, say, test its ownership by recalling how it had gained possession of the
box, or by checking its possession of a receipt in payment for it; could respond differently
to requests to move a box it owned from requests to move one it didn't own; and, in
general, could perform those tests and actions that are generally associated with the
determination and exercise of ownership in our law and culture" [Herbert A Simon,
"Artificial Intelligence Systems that Understand" (IJCAI-77, Proceedings) p.1064 [quoted
in Dreyfus p.13] Agents (in current computer games) do not understand ownership
because they do not understand the social activity in which ownership is embedded:
ownership is a concept which belongs to the social activity of Enforcing Ownership, an
activity which involves agents monitoring who owns what, and punishing others who mess
with other people's things.
Social Activities add colour to our lives, lives which are otherwise coarse and
materialist. From a certain perspective, our lives can seem empty. If all that there is in the
world is other objects and agents, what is there for us to do except manipulate those objects
and agents? All that we can want is to acquire as many objects as possible, and have as much
influence over other people as possible.
This coarse and materialistic view of human nature is based on the assumption that all that
exists is other objects and agents. Getting away from this dark picture involves admitting the
existence of a variety of social activities. These activities elevate us because they give us new
things to want. The lives of the Black and White creatures and the Sims characters are
unquestionably materialistic because their desires are materialistic.
The sophisticated skills we prize about ourselves, our ability to reflect, communicate, care about
others, are dependent on our ability to participate in various social activities. Social activities,
rather than just being one of the many things the mind thinks about, are actually the things
which make sophisticated thought possible. It is because we can participate in sophisticated
social activities that we can have the sophisticated thoughts that we prize. Philosophers have
given many examples of aspects of sophisticated personhood which are conditional on
participating in social activities.
z Understanding of language. Wittgenstein's language-games are examples in which
someone knows what something means because he understands the activity he is in.
Investigations $2: "The language is meant to serve for communication between a builder
A and an assistant B. A is building with building-stones: there are blocks, pillars, slabs
and beams. B has to pass the stones, and that in the order in which A needs them. For
this purpose they use a language consisting of the words "block", "pillar", "slab", "beam".
A calls them out; -B brings the stone which he has learnt to bring at such-and-such a call.
If someone understands what "pillar" means, he does so in virtue of understanding its
role in the activity in which it is embedded.
z Compassion towards others. Wittgenstein asks "Why can a dog feel fear but not remorse?
Would it be right to say "Because he can't talk"?" [Zettel $518] The reason a dog can't
feel remorse is that he cannot participate in the moral community in such a way as to
recognise or manifest remorse. It is because he cannot participate in this activity that he
cannot enter into these feelings.
z Feelings
"The concept of pain is characterised by its particular function in our life. Pain has this
position in our life, has these connections". [Zettel $532]
"The concept of pain is bound up not just with characteristic pain-behaviour in
circumstances of injury or illness, but also with pity and commiseration, fear and anxiety,
cruelty and mercy." [B&H Vol3 p.68]
z Structure and Organisation of Knowledge. "Heidegger pointed out that the outer horizon
or background of cultural practices was the condition of the possibility of determining
relevant facts and features and thus prerequisite for structuring the inner
horizon" [Dreyfus 36] In other words, we must not represent facts in one big pool, but
must organise them structured by which social activities they are relevant to. This is the
way for agents to get efficient access to the things which are relevant in a particular
situation. "The basic insight dominates these discussions that the situation is organized
from the start in terms of human needs and propensities which give the facts
meaning" [Dreyfus p.262]
Notice some of these activities, like a meal, are short-lived, but others are extremely long-
term: a family lasts as long as there is anyone in it. Some activities can only exist within the
context of a parent activity - a Goodbye activity, for example, cannot exist on its own. These
activities which must have parents are called sub-activities.
These activities group people together, and these groupings can overlap:
In this dramatically charged example of activity-overlap, Arthur and Belinda are married. Arthur
and Charlie are friends, but Belinda and Charlie are flirting.
Since there are so many different sorts of activity, we shouldn't think in terms of "adding
activities to our simulation" being a one-off operation. Rather, once our simulation has become
activity-enabled, activities have become a kind of content. When building a simulation of a
society of agents, there are many different levels at which we should easily be able to add new
content - there is no magic formula or technique to generate plausible agents, there is just lots
of different behaviour to simulate. Each bit of behaviour needs to be easily-addable; behaviour
needs to be addable at lots of different levels (e.g. objects, animations, actions, and goals). If
our simulations are to address activities, as we have argued they must, then activities become
one of these levels. The sophistication of a simulation is a function of the level at which we can
easily add content.
There is one superficially impressive argument against this model, viz the fact that, in the real
world, it seems that activities do not exist, at least in this sense. If they exist at all, they exist
in our minds alone. "Ought we not, therefore, simply build the agent minds in an appropriate
way, and hope that they 'discover' or 'learn' activities?"
Now that we have explained why we think there should be 'activity-things' in the game-world,
let us drop the shudder quotes and silly name, and call them simply group activities, or (where
no confusion with the naive meaning of the word arises) simply activities.
Remember that an agent does not blindly follow the commands of an activity. Instead, an
activity requests that an agent does something (and communicates to him the consequences of
failing to follow this request). An agent who is in many overlapping activities at once will have a
number of requests pending at any time, and must choose between them. He decides which to
follow by looking at the consequences of accepting or rejecting each request, and chooses the
option which best satisfies his current goals.
"OK", says the objector, "your agents have some autonomy because they can choose between
the various requests they receive. But they are not properly autonomous because the range of
choices is dictated from outside (by the activities), they do not themselves decide what the
range of choices is".
This objection misses the point made earlier, by Wittgenstein and others, that participation in
an activity enables us to make choices we couldn't otherwise make. Unless we are participating
in the game of chess, we cannot choose between castling kingside and queenside. This choice is
only available to us once we have entered into the chess-playing activity. Social-activities, in
other words, are not constrictive but enabling. As another example, a couple cannot get
married except in a societal and cultural context. Without culture, while they can visit a civic
building, swap rings, say appropriate words, sign in a book and take lots of photos, they have
not got married despite the fact that (in the particular culture and society that the authors
come from) these are the physical actions that a couple do perform if they get married.
Participation opens up new possibilities.
Already these requirements seem almost incompatible: if activities must be efficient and
debuggable, they should be written in C++ so we can use the existing optimisation and
debugging tools, but if they are written in C++ it won't be very easy for content authors to add
a new activity, because it will require a programmer, and it will be very difficult for the
structure of the activity to be exposed to the agents: it might necessitate duplication of
information, once to do things, and once to tell the agents what will be done. If, on the other
hand, an activity is defined in a text file in a nice easy-to-use language, then content authors
will be able to add new activities easily, and the agents will be able to use the information, but
we lose the optimisation and debugging tools which C++ gives us. What to do?
Implementation: Overview
A new activity is defined in a text file. When the activity is parsed, a new c++ activity is defined
in cpp and header files. This activity inherits from the base Activity class:
The alternative to this might be some C preprocessor based technology (Game Programming
Gems Vol 1, Section 3.1 - A Finite State Machine Class). This would have the advantages of
being simpler (one would not need to write a parser) and one would gain the benefit that the
code the debugger debugs would be precisely the same as that the user wrote, rather than
merely generated from it. However, this would unacceptably straightjacket us - the
preprocessor is a very weak tool. Moreover, writing a parser is not very hard using lex and yacc
(note also that the hard part of writing a compiler from scratch, the semantics, can be partly
avoided as we build our language on C++ and can leave some of the semantics to the C++
compiler). Moreover, on certain systems, via use of techniques like #line preprocessor
directives, the debugger can be made to take the user directly to the appropriate line of the
activity file, if the system breaks in an activity.
Each particular state can have its own parameters, its own preconditions, its own message
handling, its own local variables with initialisations, its own actions, and its own sub-states.
Needs
Needs conditions specify what needs to be true for the activity to remain in this state, and what
state to switch into if circumstances change. If we are in a particular state, all needs conditions
in all parents states are checked before checking our own particular state's. If we have the
state tree:
S1:
needs 1
S2:
{
needs 2
..
}
S3:
…
and we are in state S2, then needs1 will be checked, followed by needs2.
Requests
The only way an activity can affect the game-world is by issuing a request to an agent. Request
statements ask agents if they wouldn't mind performing actions, and waits to find out if they
actually bother to perform these activities.
// needs conditions
requests
agent1.action1 !⇒ Rejection(1)
→
Finished
{
// actions to perform while waiting
}
The first time this state is entered, the requests are passed to the agents concerned. From then
on, we wait to see if any agent rejects the request; if so the rejection message is passed. If all
agents finish their requested activities, state moves to Finished.
Messages
Messages are a way for activities to communicate without having to knowing each other's
implementation details. Activities respond to messages using message-handlers. A message can
have various parameters which are passed to the handler. If a message is passed to an activity,
the handler function is called immediately. The handler may or may not change the state of the
activity. Messages can be defined at any level of the state hierarchy -- handlers lower down the
chain take priority over more general handlers.
Example Activity
In this article, we have tried to motivate the introduction of social activities as the next obvious
level at which to add content, and we have outlined a working system which makes it very easy
to add a new activity (without having to worry about all the book-keeping needed to integrate
our new activity with all the others). In our prototype, we already have a large number of
activities running simultaneously: various games (both turn-taking games and games with
simultaneous-turns), conversations, meal-times, courtship activities, with a moral community
activity running in the background.
This work has been inspired by philosophers (Wittgenstein and Dreyfus) who are apparently
critical of the very possibility of AI. It is pleasantly ironic that their work, which they might see
as precluding the possibility of AI, will result in the next generation of social agents.
References
By David Brickhill
Gamasutra
April 10, 2002
URL: http://www.gamasutra.com/features/20020410/brickhill_01.htm
This paper proposes a general solution for a problem that will probably arise in the near future,
perhaps with the launch of PlayStation 3 and its counterparts. Presupposed is the exacerbation
of a current trend in console hardware, where hardware transformation, lighting, and triangle
drawing outpace the hardware's capacity to store and move data. In other words, in the future,
transforming vertices and drawing triangles will be virtually free in comparison with storing
model data and reading it from main RAM.
For brevity the paper focuses on a solution for character models. On the one hand, character
models pose more problems than fixed environment geometry in regard to controlling the
animation of vertices with bones. On the other hand, culling and view-dependent level-of-detail
need not be considered.
The summary of the problem we propose to solve is as follows: render a very dense animated
skinned character mesh in "the future," when transforming vertices and drawing triangles will
be cheap, and when loading from RAM will be expensive. (For programmers of the PS2, this
"future" will sound more like the present!)
The contribution of this paper is to collect together several already-published conceptions and
propose their application to the problem at hand. Some details are given here, but the reader is
encouraged to consult the academic papers in the References for more detail.
Later sections discusse the details of the problem, outline several techniques whose integration
yields a potential solution, and briefly discuss a possible future without textures.
With the onset of the next "new thing" in game hardware, consumer expectation could very well
be an amount of detail equivalent to one triangle per pixel. If screen resolution continues to
double with each generation, 1024x768 will be the typical screen size. If, when close to the
view, a character model occupies roughly 50% of the screen pixels, and given that the model
has half its triangles facing away from the view, then the total triangle count will be nearly
800,000 triangles per character model.
Will transformation and drawing performance really be up to this task? To answer that question,
let's look at the advance we saw from PS1 to PS2. The best PS1 character engines seen by the
author could render about 20 400-triangle characters at 30Hz, with only single-bone control and
no dynamic lighting. The author has seen character engines on PS2 with performance of 31
3400-triangle characters at 60Hz, with up to 3 bones per vertex and with dynamic lighting on
all vertices. Even ignoring the added bone control and the dynamic lighting, we have an
increase by a factor of more than 25.
So let's assume that transformation and drawing performance will increase again by a factor of
25. In terms of total capacity for a character renderer, PS3 will then have the ability to
transform, light, and draw 31*3400*25 > 2.6 million triangles per frame at 60Hz.
z Indexed triangle strips. Each strip vertex has a 32-bit index and 32-bit texture
coordinates (16-bits for U and for V).
z Each vertex is 32-bit floating-point.
z The mesh has a 2-to-1 triangle-to-vertex ratio.
z Stripping performance is 8 strip vertices per strip.
With these assumptions, current techniques will result in the following storage needs for each
character:
So for the characters in our 10-character scene, we would need to store 83MB of data and mess
with it each frame.
While predicting the future is dangerous, it is the author's opinion that future hardware will be
able to easily transform and draw meshes of the above size but that it will not be able to store
and access that much data. The predicted implications are that rendering techniques that store
and move only small amounts of data will be necessary in order to meet consumer expectations
of visual quality. Those rendering techniques will have to include smooth LOD control.
Loop Subdivision Surfaces provide a solution to the problems predicted above by having the
following advantages:
z The program need store only a course base mesh with which an algorithm procedurally
generates a smooth surface.
z The vertices of the base mesh can be animated with bone control, still allowing the
rendered mesh to be created procedurally.
z Smooth LOD control is implicit in the method as a result of blending between 2 levels of
subdivision.
z Sharp edges can be included with continuous sharpness control.
z Base meshes of any topology can be used.
The Basics
This section gives a very brief explanation of how Loop Subdivision works. For a full exposition
of subdivision surfaces, see [7].
Begin with a base mesh. The base mesh data structure must have triangles, edges, and
vertices. The vertices store the actual geometry and color content for the model. All features
(vertices, edges, triangles) must store information on the connectivity to other features. The
exact format of this connectivity depends on the rendering method. To help explain the Loop
scheme below, we will use a naïve structure for vertices as follows. For simplicity the structure
stores only geometric position and not color information.
typedef struct
{
VERT* parents[2]; // Verts on birth edge.
VERT* across[2]; // Verts opposite birth edge.
ARRAY<VERT*> adjs; // Vertices sharing an edge.
VECTOR pos;
VECTOR normal;
} VERT;
The "parents" are the vertices of the "birth" edge from which the vertex was originally created
in the splitting step. The "across" are the vertices opposite the birth edge. The "adjs" are the
vertices sharing an edge with the vertex. In the subdivision algorithm, "parents" and "across"
are used only in the first average when the vertex is created from a split. The "adjs" are used
only in subsequent averages. Figure 1 illustrates the VERT structure. Note that the "adjs" must
be stored in counter-clockwise order for the tangent computation to work properly.
Rendering a subdivided mesh consists of a recursive process where at each level of recursion
there are two steps. The first step is the splitting step, where the mesh is each triangle is split
into 4 triangles at the midpoints of each edge. The second step is the averaging step, where the
positions of each vertex in the split mesh are perturbed according to a weighted average of
neighboring vertices.
For Loop Subdivision there are four cases for the averaging rules. At each level of subdivision,
there are two kinds of vertices: the vertices that were already there ("old"), and the vertices
that were just created by the split ("new"). For both old and new vertices, boundary vertices, or
vertices that lie touching an edge with only one adjacent triangle, are handled differently from
interior vertices.
After the entire mesh is subdivided and averaged, then the result can be passed back into the
algorithm and subdivided again. Repeating this process produces nice smooth meshes.
The logic for averaging a single vertex of the mesh is given in the code below.
void ComputeAverage(VERT* vert) {
if(Vertex_Created_By_Split(vert)) {
VERT* par0 = vert->parents[0];
VERT* par1 = vert->parents[1];
VERT* acc0 = vert->across[0];
VERT* acc1 = vert->across[1];
if(Vertex_On_Boundary(vert)) {
vert->pos = (par0->pos+par1->pos)/2;
}
else {
vert->pos = (3.0/8.0)*par0->pos+
(3.0/8.0)*par1->pos+
(1.0/8.0)*acc0->pos+
(1.0/8.0)*acc1->pos;
}
}
else {
if(Vertex_On_Boundary(vert)) {
vert->pos = (6.0/8.0)*vert->pos;
for(int i=0;i<vert->adjs.Size();i++) {
// Exactly 2 should be on boundary.
VERT* a = vert->adjs[i];
if(Vertex_On_Boundary(a)) {
vert->pos += (1.0/8.0)*a->pos;
}
}
}
else {
int n = adjs.Size();
float temp = 3 + 2*cos(2.0*PI/n);
float alpha = (40.0 - temp*temp) / 64.0;
vert->pos = (1-alpha)*vert->pos;
alpha /= n;
for(int i=0;i<n;i++) {
vert->pos += alpha * vert->adjs[i]->pos;
}
}
}
}
Visual representations in so-called "Averaging Masks" can been seen in virtually every paper on
subdivision surfaces.
Edge Sharpness
What if you want some sharp edges? Edge sharpness is an area where subdivision surfaces
have an advantage over Bezier patches and NURBs, because it is easy to control the degree of
sharpness at an edge.
To make an edge look sharp, we simple skip the averaging step for any vertices on the edge.
The program can set the degree of sharpness by how many levels of subdivision it waits before
applying the averaging step again. See [3] for details and examples.
Normal Offsets
While a subdivision surface will be nice looking and very smooth, there are always needs for
finer detail. The program can add this fine detail with normal offsets, or "displacements" as [4]
calls them.
vert->normal = OuterProduct(tang_a,tang_b);
}
Once the normal at a vertex is computed, its final position is just a scalar displacement value
multiplied by that normal. Note that for smooth shading, the normal at the displaced vertex
may be needed. This displaced vertex normal is not the same as the subdivision surface
normal, and its computation is a bit trickier. See [4] for more detail.
Color Offsets
Just as geometric coordinates can be stored as part of the vertex data, additional "scalar fields,"
as [3] calls them, can be included for vertex colors. Just as normal offsets can be added to
displace the vertices, color offsets can be added to the subdivided color values.
Color offsets create a framework for a character model that doesn't require textures. Given that
our supposed triangle size is one triangle per pixel anyway, textures start to seem impractical,
especially on character models where there is little opportunity to repeat or tile textures.
When used along with the Wavelet Compression described below, color offsets to the results of
color subdivision as scalar fields provide a practical and more flexible alternative to texture
mapping.
Real-Time Rendering?
Naively implementing Loop Subdivision as a recursive process is prohibitively slow and would
not be appropriate for a game display engine. In recent years, however, numerous techniques
([1], [2], [6]) for rendering with the Loop scheme have been proposed and implemented.
Wavelet Compression
If we want the added data of normal displacements and color offsets, we're going to have to
compress it somehow for the data size to be practical. Because of their similarity regarding
weighted averages, wavelet compression techniques are well suited to the task.
Without getting into the mathematics of wavelets, the basic idea behind their use for
compression is that data can be reduced in size, with minimal loss of fidelity, if it first
transformed into a set of averages and a set of details rather than in it's original form. In a
similar approach to creating a subdivision surface, a program will average two or more data
points (vertices if you will), storing not just the average but also the difference between the
data points and the average. The process is then repeated, this time only on the just-created
averages. When complete, the data stream will be the same size as the original, beginning with
one overall average followed by difference, or "detail" points.
If the averaging scheme has certain properties, the resulting data will be such that if the
smallest of the detail points are turned to zero, and if the transformation is inverted, the result
of the inverted process will be very close to the original.
Compression is achieved after turning small values to zero and then storing the data in a runs-
of-zeros scheme.
Harr Wavelets
The simplest of the wavelet schemes is the Harr Wavelet scheme, defined over a 1-dimensional
set of points. The following code will make a single pass on a set of floating-point numbers,
performing a transformation according to the Harr Scheme. Each successive pair of numbers is
simply summed and then multiplied by a scalar, 1.0/sqrt(2). The scalar multiplication is to
maintain normalization, discussed later.
The table below gives example data acted upon by the routine above. Note the following:
z In each pass, the "averages" are placed in the beginning of the data, and the "details" are
placed at the end.
z Each pass acts on half the amount of data of the previous pass.
z The first entry in Pass 4, 53.5, is the sum of the original data divided by sqrt(2)^4, or in
other words, a scaled overall average of the original data.
z In practice, the data streams must be fairly large in order to see any dramatic reduction
by throwing out small values.
Another important attribute of a wavelet scheme is normalization. Recall in the routine of Harr
wavelets section that each average is weighted by a factor of 2/sqrt(2), implemented by
summing the terms and dividing by sqrt(2). This is done so that successive averaging and
differencing do not in appropriately scale the later passes. Normalization is done so that the
relative size of elements after the final pass gives an accurate indication of how important they
are to the data so that error tolerances will have their usual meaning.
There are actually two purposes in applying a wavelet-like solution to compressing the color
offsets and normal displacements of our subdivision surface. The first is compression. The
second is to allow editing of the model at different levels of resolution.
Note that while advocating the techniques of [5], this paper proposes them for a slightly
different use. While sharing the topological aspects of [5], the actual data we want to compress
is slightly different. Whereas [5] aims to allow a mesh to be parameterized as a subdivision
surface and stored using wavelets, we propose using the same technique only to compress
normal displacements and color offsets, not an arbitrary original mesh.
Matrix Construction
At each level of subdivision, there are two matrices that need construction. The first is the
analysis matrix, which transforms the color offsets and normal displacements from their values
in the higher level to weighted averages and detail coefficients in their lower level. The second
matrix is the synthesis matrix, which performs the opposite task as the analysis matrix, and in
fact is just the matrix inverse of the analysis matrix.
Full details of how to construct these matrices are beyond the scope of this paper, and they
have already been laid out with examples in [5]. The serious reader should definitely read [5]
before building a system combining subdivision surfaces and wavelets.
To summarize matrix construction of [5], the synthesis matrix is constructed first, and the
analysis matrix is computed as the synthesis matrix's inverse. Suppose that in the current level
of subdivision we are going from M vertices up to N (greater than M) vertices. In block form,
the synthesis matrix is given by [P Q], where P is an N x M matrix and Q is an N x (N-M)
matrix. Construction of P is very simple, just consisting of rows whose columns contain the
same coefficients that the subdivision surface scheme uses to average vertices when going to
the next subdivision level (see the code in section 3.1). The complexity is in the construction of
the Q matrix. Patient reading of [5] will yield an understanding of how to compute Q for any
subdivision scheme, not just the Loop scheme.
Sparse Matrices
Actually implementing the technique with naïve N-by-N matrices is of course not recommended,
because the size of these matrices would be ridiculously large. Instead, either the process
should simply be coded in direct fashion, or at a minimum sparse matrix classes should be
employed.
In this form, O are the rows corresponding to the vertices already a part of the lower level of
subdivision, N are the rows corresponding to the vertices created by the splits, and a is a
special "magic" matrix derived in [5]. Although not explicitly clear, a section in [5] regarding
the whether O is invertible for primal schemes (Loop is a primal scheme) implies that O is
invertible for Loop. The text in [5] states that most primal schemes have invertible O, with
Catmull-Clark as the only noted exception. Therefore we aim to show here only that if O is
invertible, then so is [P Q].
It can be shown easily through block matrix multiplication that the inverse of [P Q] as given
above is as follows:
Part of the author's view of the future is that eventually we will stop using those annoying
textures. Compressing them, transferring them about, dealing with caching, etc., etc., are
horrid nuisances that become obviated once meshes get very dense, especially for characters
where there is little opportunity to repeat textures anyway. The author hopes for a time when
character models are simply painted on their surfaces by an artist.
Naturally the storage of all that color data would be prohibitive for an 800,000-triangle mesh,
but by efficiently implementing the techniques outlined here and detailed in the references,
perhaps a texture-less world can become a reality. Barring that, [2] gives a practical method
for incorporating "texture meshes" into the scheme, whereby the connectivity of the texture
coordinates is handled separately but in similar fashion to the geometry.
Conclusion
Industrial use of subdivision surfaces in real-time has yet to gain much of a following, at least
within the game business. Graphic quality in games is still only on the cusp of requiring curved
surface techniques at all. Furthermore, while [1], [2], and [6] implement viable methods for
rendering subdivision surfaces in real-time, the problems of efficiently combining the techniques
outlined in this paper remain open. In particular, synthesizing the color offsets and normal
displacements from their compressed form in a manner compatible with small-cache graphics
hardware will be very challenging. Hopefully a unified, efficient implementation scheme will be
available before the onset of the next "Next Gen" hardware.
References
[3] DeRose, "Subdivision Surfaces in Character Animation". Proceedings of the 25th annual
conference on Computer Graphics, 1998, Pages 85 - 94.
[4] Hoppe, "Displaced Subdivision Surfaces". Computer Graphics Proceedings, SIGGRAPH 2000,
2000.
[5] Lounsbery, "Multiresolution Analysis for Surfaces of Arbitrary Topological Type". ACM
Transactions on Graphics, Vol. 16, No. 1, January 1997, Pages 34-73.
[6] Pulli, "Fast Rendering of Subdivision Surfaces". The art and interdisciplinary programs of
SIGGRAPH '96 on SIGGRAPH '96 visual proceedings, 1996, Page 144.
[7] Zorin, "Subdivision for Modeling and Animation". SIGGRAPH '98 Course Notes, 1998.
http://www.multires.caltech.edu/teaching/courses/subdivision
______________________________________________________
By Patrick Smith
Gamasutra
April 5, 2002
URL: http://www.gamasutra.com/features/20020405/smith_01.htm
One of the fundamental goals of an AI system is to avoid making the unit appear "dumb." At
the root of this challenge lies one of the hardest problems to overcome efficiently and
believably: pathfinding. Today, 3D graphics and sound technologies begin to reach a level of
realism that is easily destroyed by seeing an AI unit walk face-first into a solid wall, slide along
the perimeter and ultimately get stuck on an errant polygon. Traditional technologies that
worked well for games a few years ago fall apart when faced with the complexity of today's 3D
environments.
This paper addresses the pitfalls of attempting to pathfind the arbitrary world we call "polygon
soup." It covers automatic data generation and compression, a run-time distributed algorithm,
organic post-process modifiers, and solutions for tricky situations such as doors, ladders, and
elevators.
There are numerous good algorithms for determining a path given a connectivity graph. For
simple 2D games the connectivity graph was a side affect of the tiling system used to create the
map. For 3D games, especially those constructed from arbitrary geometry, no such simple
solution exists. So the problem exists, where do we get good data?
Imagine importing the outline of a stained glass window into your favorite paint program. The
image consists of large, irregularly shaped regions of pure white space. Your task: color each of
these regions to complete the stained glass window. Now, which tool is best for the job, the
pencil (which would require you to color each and every pixel independently) or the paint
bucket (which can recursively flood a whole region with color)? Should be an easy decision,
right? So why is it that many level designers are forced to manually populate their levels with
numerous pathfind nodes, sectors, grids, zones, planes, subdivisions, portals or whatever
whoseewhatsit is popular that day? Now, visualize an automated system that could discover all
possible traversable locations in your most complicated environment with a single click of the
mouse. Can you picture it? Great! Let's build it.
To solve this Herculean task we'll borrow a simple idea from our paint program metaphor,
namely, the recursive flood fill. The only tools we'll need are a good collision detection system
and the knowledge of a single point in the polygon soup where a unit can stand. The algorithm
works as follows:
Use the collision detection system to determine if the unit can exist at the start point. If the
area is valid, add that point to a list. This will be our task list. Now, loop over all the entries in
the task list (which is currently one). For each entry in the list, simulate the unit taking a step
in each of the four cardinal directions (north, east, south, and west). If one of these points
passes the collision check, add it to the task list. Retain the connection information between
these two points (point A can walk to point B).
There, you're done, once this algorithm finishes you'll have a complete connectivity graph of the
current game level. Of course there are always details…
At this point you should begin to see the familiar form the data is taking, a grid! However, this
isn't the simple 2D grid you may be imagining. If you ignore the elevation of each box, the grid
is uniform, however because we simulate walking from one grid cell to the next, it can take us
up or down slopes, across bridges and over and under overpasses. This means that for any
given coordinate on the "plane" there can exist more then one grid cell (consider elevation). A
good example of this is a spiral staircase. The algorithm will fill up and around each turn of the
staircase as if it were on a flat plane, however if you stood at the top and cast a ray down the
course of the staircase you would encounter a grid cell at every turn of the stairwell.
The reason for this derives from the fact that the grid is axis-aligned (non-rotated). If a door or
other passageway is rotated, and during game play the object fits through the door because it
rotates as well, then a non-rotated pathfind cell will collide with the door.
Detail 3: It's not what you know, but who you know.
At a minimum, each cell of the grid only needs to know which neighbors are traversable. This
can be accomplished via a simple data structure that has four pointers, one to each of its
neighbors. If a pointer is NULL, that direction cannot be traversed.
class CPathCell
{
.
.
.
CPathCell * NorthCell;
CPathCell * SouthCell;
CPathCell * EastCell;
CPathCell * WestCell;
};
This simple data structure completes a connectivity graph - we now have enough information to
pathfind between any two cells in the level.
Keep in mind that this is a preprocess step, we're simply generating the data. Before we can
use this data it must be compressed into a format that is feasible for run-time application.
It is extremely satisfying to give an AI the command, "Goto (x,y,z)", and watch as the unit
performs a complex series of steps. For example, given a single destination, an AI could walk to
the front door of a building, open the door, walk to the elevator, wait for the elevator to arrive,
take the elevator to the roof, exit the elevator, walk to the ladder at the base of the tower, and
climb to the top of the tower.
Even more satisfying is to have this information about the "teleport" mechanisms automatically
detected during the data generation step. To arrive at this goal, we'll need to know the location
of each door, elevator, ladder, or other teleport mechanism, and it's enter and exit zones. The
algorithm now works as follows:
During flood fill, grab an unprocessed cell from the task list. Perform the simulation as before, if
the cell passes, check to see if it is completely contained within the enter zone of one of the
teleport mechanisms. If this cell is indeed inside an enter zone, create a new cell in the center
of the exit zone and perform a collision check to ensure it is safe to stand there. If the new cell
passes this test, add it back into the task list - this will ensure the simulation continues on the
other side of the mechanism.
Avoid the temptation to store the connection between the "enter" cell and the "exit" cell. This
extra data will unnecessarily bloat the size of each one of the cells in your 10 million cell flood
fill. The connection information can easily be determined after the flood fill is complete. This
issue is addressed later when discussing data compression.
Now that we have a nice simple algorithm for generating the connectivity graph of a level, it is
time to start inspecting the data. Let's say each cell requires approximately 50 bytes of data,
and we've generated somewhere on the order of 10 million cells, that means we have 500MB of
pathfind data. When problem solving, the more data the better, but half a gigabyte is just plain
ridiculous! It's time to compress the data down to something more manageable.
Let's examine a simple scenario. If cell A connects to cell B, and cell B connects to cell A, then
do we really need two cells? No we do not; we can combine the two cells into a sector and
throw away the original cells. We define a sector to be a convex polyhedron of freely
traversable space. Given our particular dataset and purpose, a box is the most effective convex
polyhedron.
Cell A connects to B and C, cell C connects to A and D, cell D connects to C and B, cell B
connects to D and A. Here, we can combine all four cells into a single sector and discard the
original cells. To generalize, we can combine any rectangular set of contiguous, connected cells
into a single sector. Ultimately, we'll want to compress all the cells into sectors. The sector then
becomes the basic data type used during runtime to solve a path.
Once a sector is generated, it is necessary to retain the connection information its composite
cells originally contained. To accomplish this, retain the edge cells and discard the interior cells
of the sector. These edge cells share connection information with cells in other sectors, in
effect, linking sectors together.
To retain a sector's
connection data, keep the
edge cells.
After all the sectors have been built, you can then combine a sector's edge cells that share a
common destination sector into a portal. We define a portal to be a connection between
sectors; i.e. to get from sector A to sector B walk from your current position in sector A through
the AB portal. The portal's physical shape is the bounding volume of its composite edge cells.
Portals can be one-way or two-way. If the edge cells in sector A are connected to the edge cells
in sector B, and the same cells in sector B are connected to the sector A cells, then the portal is
two-way. Otherwise the portal is one-way.
Once the portal is created, it is added to the list portal-list of the sector that contained the edge
cells. If this is a two-way portal, it is added to the portal-list of both sectors.
Earlier we discussed how to flood fill across teleport mechanisms such as doors, elevators and
ladders. Now that we've converted all our cells into sectors and portals, it's time to incorporate
information about these mechanisms so that our AI can use them.
To build a portal for a given mechanism, first intersect its entrance and exit zones with the
existing pathfind sectors. If the intersection test yields at least one sector for the entrance and
exit zones then we can create a portal between the two. The entrance portal's bounding box is
the volume created by intersecting the entrance zone with the pathfind sector. The exit portal's
bounding box should follow the same formula.
Store any specific information about the mechanism the AI will need to know in order to
operate it in the portal. For example, if a door is locked, save the type of key that's needed to
open the lock. When the path solver is attempting to resolve the path, it can ignore the portal if
the AI is not carrying the required key.
Anatomy of a Sector
Once a sector is generated it contains two important pieces of information, a bounding box and
a list of portals. The bounding box is used at run-time to determine where, in the pathfind
world, an AI currently is. The portal-list is the connection information that the path solver uses
to calculate how an object gets from its current position to its destination position.
class CPathfindSector
{
.
.
.
CAABox BoundingBox;
CVector<CPathfindPortal *> PortalList;
};
As you can see, the sector is really just a spatial portal linkage; most of the interesting data is
contained in the portal.
Anatomy of a Portal
The portal contains as much data as is necessary to get the AI from one sector to the next. At a
minimum this data is a reference to the destination sector (possibly two sectors for a two-way
portal), and its bounding box. A portal may also contain information about a teleport
mechanism or an action to perform.
class CPathfindPortal
{
.
.
.
CPathfindSector * DestSector1;
CPathfindSector * DestSector2;
int MechanismID;
MECHANISM_TYPE MechanismType;
CPathfindPortal * MechanismExitPortal;
ACTION_TYPE ActionType;
};
As you can see all the connection information is stored in the portal and is therefore the "key"
to solving a path.
Bob, the digital super spy, needs to avert international tragedy at the UN building.
Unfortunately, he's currently washing dirty socks in his basement. To make matters worse,
everywhere he looks he sees an ocean of polygons swimming before his eyes. How can he
possibly hope to reach his goal before time runs out?
Luckily for Bob, we can use his (X,Y,Z) location to lookup the pathfind sector he's washing socks
in. This sector contains a list of portals he can use to get to other sectors, which contain their
own portal lists, which can be used to get to yet more sectors, and so on and so forth.
At this point any number of best-path algorithms can be applied to get Bob to the UN building.
We found a modified A-star algorithm solves the problem nicely. A simple implementation would
use the accumulated distance between portals as the traversal cost, and the straight-line
distance to the goal as the heuristic cost.
Now pretend that Bob and twenty of his closest buddies need to get to the UN building at the
same time. Twenty different requests for complex paths can well exceed the amount of CPU
your game devotes to AI each frame. Luckily for us, it is not very noticeable if we distribute the
processing of these paths over a few frames. For example, if Bob sits and thinks for a half
second before getting up and walking out the door, is anyone likely to notice? On the flip side, a
half a second (500 milliseconds, ~1 billion clock cycles) is more then sufficient for a pathfinding
system to solve a few simultaneous paths.
To distribute path solves over multiple frames you could use an algorithm like the one outlined
in the following pseudo-code.
To distribute the path solve itself over multiple frames, consider the following pseudo-code.
if (sector == NULL)
{
Handle_No_Path ();
}
else if (sector == DestinationSector)
{
Hande_Path_Found ();
}
else
{
CPathfindPortal *portal = NULL;
while (portal = sector->Get_Next_Portal (portal))
{
Process_Portal (portal);
}
}
}
}
This algorithm will process as many sectors as necessary until its time-slice has elapsed.
Using the sector/portal system, a solved path is represented by an ordered series of portals.
This path will follow the form: walk to portal AB, turn to face portal BT, walk to portal BT, turn
to face portal TU, walk to portal TU, and so on. Unless your AI is a robot, it is desirable for the
unit to behave as organically as possible; bee-lining to each portal does not make for a very
organic path.
To make our path appear a little more natural we'd like to use splines. In layman's terms,
splines are basically curved lines that follow a set of points. Different types of splines follow
their points in different ways. Some splines pass through their control points, whereas other
splines act like magnets and are either attracted or repulsed by their control points. To see the
advantage of splining the path, consider the following diagram.
Splining the final path yields a much more organic curve.
Looks great, right? Unfortunately, there is a problem. Since the spline doesn't know anything
about the sectors and portals that define the "safe" area for a unit to walk, it is likely that a
curve on the spline will leave these volumes causing the unit to walk into desk corners,
doorways, trash cans, or even meander off the edge of a cliff!
Luckily we can take advantage of a mathematical property of Bezier curves (a type of spline): if
the control points of the Bezier curve lie inside a volume, then the curve itself will lie inside the
volume. A Bezier curve is a cubic spline defined by a start point, an end point and a series of
control points. The control points act like magnets to "pull" the curve away from the line
segment defined by the start and end points.
Given this, our organic post process algorithm would function as follows:
Build a Bezier curve from the points on the final path. Build a list of the sectors and portals that
the AI will pass through on its path. For each control point on the Bezier curve, clip it to the
sector volumes. To clip a control point, form a line from the control point to the corresponding
point on the un-splined path. Test this line segment to see if it passes through the side of any
of the sector boxes in the list. If it does, check to see if the area of intersection is a portal. If
the point intersects a sector wall, and it does not pass through a portal, then clip the control
point to the sector wall; otherwise leave the point alone - it is valid.
This algorithm will allow the AI to follow a nicely curved path without ever leaving the safe
pathfind sectors.
In a busy city, it would be odd to see cars driving down the sidewalk and pedestrians
meandering through the streets. However, using our current pathfinding system this is exactly
what you'd see. The flood fill algorithm does not encode (nor is it aware of) the type of ground
covered, be it sidewalk, highway, or burning hot lava. A possible solution to this problem is to
store surface information in each cell of the flood fill and have your compression algorithm only
generate sectors containing cells of the same type. This would yield sidewalk sectors, highway
sectors and burning hot lava sectors. The run-time path solver could then utilize this
information when determining what sectors are valid for a given object (i.e. cars only pathfind
through street sectors).
Innate waypaths are another powerful solution that integrates well with our pathfind system. A
waypath is a manually created series of waypoints which form a path. This waypath can be rigid
or splined. Each waypoint on the waypath can encode specific information, such as crouch here,
speed up, slow down, or even jump to the next waypoint. To integrate this information into our
path solver, consider the following algorithm.
Run a normal pathfind flood fill to generate sectors and portals. For each innate waypath,
create a "dummy" sector (a sector without size or position), and add it to the system. For every
waypoint in the waypath, find which pathfind sector the waypoint intersects. Create a two-way
portal from the pathfind sector to the dummy waypath sector at the location of the waypoint.
Add this new portal to both the pathfind sector and the dummy waypath sector.
During the run-time path solve, the algorithm's heuristic should bias toward these waypath
sectors (i.e. multiply the cost of using a waypath sector by 0.75). This will cause AIs to "tend"
to follow innate waypaths.
Given this system, to enable vehicle AIs to drive along the right side of the street, a level
designer would simply draw a one-way, vehicle only, innate waypath along each side of the
road.
Currently, players are smarter then AIs: humans have the ability to combine abstract concepts
and make leaps of logic. This reasoning ability allows players to perform actions that AIs cannot
originate, but if we're lucky perhaps they can replicate. For instance, during a firefight on the
second floor of a building, the player reasons he cannot win. In desperation he shoots out the
window and, with a mighty leap, escapes onto the street. What does the AI do? Unless we give
it an option, the AI will pathfind down the stairs, open the front door and look dumbly down the
street, for the player is long gone. Wouldn't it be much more fun if the AI simply jumped out
the window after the player?
There are many different ways to approach this problem, so let's choose the simplest. We know
where the player jumped from and we know where the player landed. If both points are inside
the pathfind data, why not add a temporary one-way "jump" portal between these two sectors?
All we need to do is encode a little information about the jump such as orientation, velocity, and
time. Depending on the physics system, this may be enough for the AI to replicate the player's
mighty leap. We can even keep a FIFO "bucket" of these temporary portals so other AIs can
follow.
Vehicles
Arbitrary vehicle pathfinding is an order of magnitude more difficult then traditional bipedal
pathfinding. Humans can turn on a dime, whereas vehicles have a turn radius. Humans follow
the same rules regardless of orientation, whereas vehicles act differently in drive and reverse.
Humans can stop on a dime, whereas vehicles skid. Humans do not tip over when running too
fast, whereas vehicles can roll.
A complete vehicle pathfinding solution is outside the scope of this paper, however we will
present a brief overview that may work for some games.
Firstly, modify the flood fill algorithm to use the bounding volume of the largest vehicle
regardless of its orientation. During the run-time path solve, take the turn radius of the vehicle
into consideration when evaluating portals. In other words, we know which portal the vehicle is
coming from, so discard any destination portals that would cause the vehicle to turn sharper
then it is able.
Once the path is generated, it may be problematic for a vehicle to follow. This is because
orientation is important when dealing with vehicles; however our sector/portal system doesn't
implicitly handle this very well. Consider the following diagram.
In the sector/portal system, the vehicle will beeline from the Sector A's entrance to Sector B.
This orients the vehicle in such a way that it will be impossible for the vehicle to make the
following turn into the portal for Sector C. However, if the vehicle would "arc" out into Sector A,
it would be possible to make both Sector B and Sector C. Unfortunately there is no way of
knowing this is required unless we search ahead on the path.
A simple solution to this problem, which works well with the system we've described so far, is to
spline the path. However, none of the classical "true" splines will work for this situation; which
is fine -- we'll create our own custom curve.
Our goal: create a continuous curved line that will cause the vehicle to arc around corners while
still obeying the vehicle's turn radius restrictions. Such a path can be created using only a
straight line and the vehicle's turning circle. Unlike "true" splines, this path is not continuous in
the mathematical sense, but composed of three distinct continuous parts, which, when placed
end-to-end, form a continuous path. The three parts are: the exit curve from the previous
point, a straight line from the exit curve to the enter curve of the next point, and the enter
curve of the next point. Note: The starting and ending points only contain two parts (there is no
previous part for the starting point, nor is there a next point for the ending point). Consider the
following diagram:
To build this curve, overlay the vehicle's turning circle onto each node of the path. We will
assume the optimal center of this turn arc will lie at the point halfway between the angles
formed by the (prev_node - curr_node) and (next_node - curr_node) vectors. This causes the
actual node point to lie on the perimeter of the turning arc.
For each turning circle on the path, find the "in" and "out" tangent points. The "in" tangent
point is the closest point of tangency from the previous point to the turn arc. The "out" tangent
point is the closet point of tangency from the next point to the turn arc. Now, simply connect
the dots. The path is as follows: straight-line from starting point to the "in" tangent point on the
turn arc of the next path node; follow the turn arc to "out" tangent point; straight line from this
point to the "in" tangent point of the turn arc of the next path node, etc, etc.
Once finished, this algorithm yields a continuous curve that follows the turning restrictions of
the vehicle following the path.
The vehicle curve ensures the vehicle will not attempt any
impossible turns.
Keep in mind that this curve may cause the vehicle to drive outside the "safe" areas of the
path, thus potentially colliding with objects along the way.
Conclusion
By taking advantage of a simple flood fill algorithm, we can overcome the Herculean task of
automatically generating connectivity data through complex polygon soup. With a few tricks
and extensions we can easily incorporate doors, ladders, and elevators; spline the result of the
path solve to yield a more organic looking path; dynamically alter the pathfind data to allow the
AI to replicate a player's actions; incorporate innate waypaths to force a specific behavior from
the AI; and distribute the run-time path solve over multiple frames to balance the CPU load.
Special thanks to Eric Cosky for the original concept of the flood fill algorithm, a brilliant idea
that proves the worth of brainstorming with a friend before jumping into a complex problem.
Also, thanks to Colin Mclaughlan for suggesting the concept of dynamic temporary "jump"
portals.
By Mike Rayner
Gamasutra
March 21, 2002
URL: http://www.gamasutra.com/gdc2002/rayner/rayner_01.htm
This paper discusses the terrain system used in SSX. An algorithm for the tessellation of
polynomial surfaces is presented. The system allows for adaptive tessellation with continuous
level-of-detail while avoiding the introduction of cracks and seams between adjacent surfaces
with different geometric resolution. The Surface lighting on the terrain is discussed.
Introduction
A high order parametric surface representation is chosen for representing the terrain in the SSX
snowboarding game. The reasoning behind this decision is to represent the snow surface with
an elegant and concise mathematical representation for physics calculations, and to allow for an
arbitrary polygonal resolution for rendering the terrain while insuring a small memory footprint
for the surface information.
Bézier patches are chosen for their fast run time evaluation, affine invariance, and convex hull
property (for quick occlusion and intersection tests). The terrain is represented as a grid of Bi-
cubic Bézier surfaces.
Each Bi-cubic Bézier surface or patch is defined by a set of control points. From these control
points we can generate any point on the surface. We tessellate the patch by connecting a grid
of evaluated surface points into triangles or triangle strips. The polygon primitives can then be
rendered by conventional 3D hardware. We can tessellate the patch to an arbitrary resolution at
run time. In order to achieve a high quality image with reasonable performance it is desirable to
evaluate the patches at different resolutions. A heuristic is required for determining what
resolution a patch should be evaluated at. For this implementation a simple distance based
Level of Detail heuristic is used. Evaluating neighbouring patches at different LOD will introduce
ugly gaps in the rendered surface. Special transition patches are generated to solve this
problem.
Traditional polygonal lighting models require a vertex-normal pair in order to generate a colour
for each vertex. The graphics hardware interpolates the vertex colours over the polygon
surface. With dynamic evaluation of a parametric surface the number of vertices and their
position will be changing. As a result a dynamically tessellated vertex-normal lighting system
will tend to pop and move which will detract from the quality of the moving image. In order to
avoid this one must decouple the geometric resolution from the light sample resolution. We use
light-maps. Light maps encode lighting as 2D textures. At run time the light map texture is
modulated with a surface texture of each polygon to create the final image. With this method
the surface lighting does not pop when the geometric resolution changes. We also have the
added benefit of calculating the lighting off-line using whatever lighting model we like. The
trade off is we lose the ability to dynamically change the surface lighting without recalculating
light maps at run time.
Background
As u varies from 0 to 1 the functions sweep out the curve. Similarly a parametric surface is
defined by three bivariate functions:
Computer graphics generally stick to a degree of 3 that is cubic. For degrees greater then 3
there is a trade off between curve flexibility and descriptions that are more cumbersome to
work with. The cubic form looks like this:
The relationship between the shape of the curve and the coefficients pi are not very intuitive.
Instead of having to manipulate the coefficients directly, the polynomial form can be rearranged
into control points and basis functions, which provide a more intuitive connection to the shape
of the curve:
The co-ordinates pi are called control points and the power basis is bi(u). The cubic basis is a
collection of linearly independent polynomial bi(u) given by:
For equation 2 the control points are (p0, p1, p2, and p3) and the basis polynomial are (1, u, u2
and u3).
A bi-cubic parametric surface has the form of equation 3 and is traced out as the parameters
(u,v) take all possible values between 0 and 1. This is known as a patch. Free form surfaces are
modelled using nets of patches. Bi-cubic parametric patches are defined over a rectangular
domain in uv-space and the boundary curves of the patch are themselves cubic polynomial
curves. A point Q with the co-ordinates (x,y,z) in Cartesian space is represented by the
parameters (u,v) in parametric space. Using the same control point and basis function
representation we define the patch as:
The net of control points forms a polyhedron in Cartesian space and the position of the points in
this space controls the shape of the surface.
The basis used for rendering the patches in SSX is the Bézier basis. The Bézier basis polynomial
are called Bernstein polynomials and are defined by:
where:
These curves show the influence that each control point has on the final curve form.
When u=0 the basis function B0,3 = 1 while the others are 0. When u=1 the basis function B3,3
= 1 while the others are 0. From this we know that when u=0, p0 will have the most influence
and when u=1, p3 will have the most influence. The control points p1 and p2 have the most
effect when u=1/3 and u=2/3 respectively. The manner in which the basis functions affect the
shape of the curve is the reason they are called blending functions. An alternate convention for
specifying a Bézier curve is the matrix convention:
To create an image that most accurately represents the terrain surface as described by the
artist we want to evaluate the curved surfaces to a high level of detail. If all patches are
tessellated to the same degree then we will generate just as much polygonal detail for far away
patches as we do for close up patches. The total number of polygons that the hardware is
required to render in most cases will limit us to a lower geometric resolution than we would like.
With an LOD system we render polygon detail only where it is required. The results are near
identical quality with fewer polygons to render. Unfortunately if all patches are not evaluated to
the same degree, seams and "tearing" will be seen between the patches. The result is quite
unacceptable for the quality expected of modern video games. The goal is to tessellate patches
to different degrees without generating seems.
The algorithm presented here ensures that no tears or seams are created between neighbouring
patches with different tessellation levels. The only constraint is that patches must have at least
C0 continuity with their neighbours. C0 continuity requires that neighbouring patches share
control points along the boundary connecting them. Note that C1 continuity is required for
smooth lighting and physics calculations between patches but is not required for the LOD
algorithm to work.
With these constraints in place the run time evaluation of the terrain will not need to be aware
of neighbouring information to produce a seamless tessellation of the entire terrain with
changing LOD. Most other algorithms that fix the "cracking" problem require neighbour
information and usually a second pass in order to correct errors. The evaluation of the Bézier
polynomial can be performed using various methods depending on the implementer's
preference. There are many methods of evaluating equation 4 such as direct evaluation,
forward differencing, and recursive subdivision. The evaluation method chosen to implement
this system is not important.
A run time heuristic is used to determine the tessellation level of each of the four boundary
curves of the Bézier patch. If each boundary curve requires the same level, then the entire
patch is uniformly tessellated to the same degree. If the 4 boundary levels differ, then a non-
uniform patch is generated. The number of edge vertices generated is directly a function of the
boundary edge heuristic. As such the number of vertices on the boundary of two patches is
always the same so long as C0 edge continuity exists. The "cracking" problem is reduced to
ensuring that a non-uniform patch does not create any inner cracks or seems while retaining
the number of edge vertices as determined by the boundary heuristic.
A non-uniform patch is treated as a uniform inner patch connected to the edge vertices with 4
triangle strips. Let the number of points per edge be specified clockwise as E0, E1, E2, and E3
respectively. The resolution of the inner patch is then defined as NxM where N = max(E0,E2)-2
and M = max(E1,E3)-2. This choice for NxM conservatively rounds up the degree of the inner
patch to highest boundary degree both vertically and horizontally. The 4 strips generated are
created from boundary edge vertices and the perimeter inner mesh vertices. S0=strip(E0,Nl),
S1=strip(E1,Mt), S2=strip(E2,Nr), S3=strip(E3,Mb). See Figure 2.
The LOD heuristic is chosen as a function of distance from the camera position Cpos. A minimum
LODmin and maximum LODmax boundary evaluation is chosen. A distance from the camera for
minimum Distnear and maximum Distfar level of detail is chosen. Only the end control points of
each edge p0 and p1 are used to determine the degree evaluation.
For speed the dist function is actually chosen as the squared distance as apposed to the square
root. For a visible distance of 500 meters the near distance would be something like 25 meters
and the far lets say 250 meters. All boundary edges at 25 meters or closer will have maximum
detail. All boundary edges at 250 meters or farther will be minimum level of detail. If we choose
a good distance range relative to average patch edge length and only a few different LOD levels
then most patches will be uniformly tessellated with non-uniform patches existing as a row of
patches connecting different uniform levels of detail at a fixed radius to the camera.
More exotic heuristics can be chosen that take into account things like edge or patch curvature,
however one must be careful to ensure that the heuristic exhibits good hysteresis so that
patches do not fluctuate between different LODs giving away the effect.
The terrain system discussed in this paper has been implemented in SSX, and SSX Tricky. The
system stores only patch control information requiring a much smaller memory footprint over
vertex geometry. Through dynamic level of detail the number of polygons rendered by the
graphics hardware is reduced without noticeable quality loss. The system does not introduce
seams between patches of different tessellation levels by ensuring identical edge tessellation on
boundaries between patches that maintain C0 continuity. Neighbouring information is not
required at run time in order to avoid LOD seems.
For terrain systems that have limited memory storage (especially video game consoles) but still
require large environments and high geometric detail, Bézier patches provide very good
compression. The cost of rendering terrain geometry at constant resolution is more expensive
than necessary and does not scale well to lower powered graphics hardware. The system
presented here attempts to decrease the memory and polygon costs of rendering a complex
terrain. With the addition of a patch caching system the performance of this terrain architecture
would scale nicely across lower powered machines.
Michael E. Mortenson, Geometric Modelling Second edition. Wiley Computer Publishing, 1997
Watt & Watt, Advanced Animation and Rendering Techniques Theory and Practice. Addison-
Wesley, ACM Press. 1992
Foley, vanDam, Feiner, and Hughes, Computer Graphics: Principles and Practice. Second
Edition. Addison-Wesley Publishing, 1990
URL: http://www.gamasutra.com/features/20020118/vandenhuevel_01.htm
Isn't it wonderful to move through the virtual world in a 3D game? The way you can walk into
walls and slide along them as you freely turn -- withoug getting your rail gun stuck between
you and the wall? The way you and your opponent can run at each other, swords drawn,
actually stop when you reach each other, and can then back away without worrying about your
shield getting caught in his chain mail? The way you can circle strafe your opponent over
uneven ground without ever having to worry about tripping? In the real world you would have
to worry about this kind of stuff…
Tim Schroeders's article "Collision Detection Using Ray Casting", inthe August 2001 issue of
Game Developer magazine focused on detecting collisions between spheres and polygons. This
article compliments the information presented there by explaining how to detect collisions
between two spheres and determine what they'll do after they collide. This is useful not only for
games like pool where accurate collision of spheres is key, but also in games where characters
and other mobile objects are bounded by spheres, these can be used to quickly determine if
they have bumped into each other.
To make it easier to explain, all the examples will be first in 2D. A later section of this article
will explain how to apply the same algorithms to 3D. For the purposes of this article, a circle will
be represented by the point at its center and its radius.
Are two stationary circles A and B currently touching? The answer, as I'm sure you already
know, is very simple. Two circles are in contact with each other if and only if the distance
between their centers is less than or equal to the sum of their radii. So, find the distance
between the centers of the two circles using the equation:
Then add the radii of the two circles together. If the sum of the radii is greater than or equal to
Dist, then the circles are touching. Since multiplications are less computationally expensive
than square roots, you should speed up this code by not performing the square root when
calculating the distance, and instead square the sum of the radii. The code below shows a
sample implementation using this shortcut.
For this problem you are given a circle A that is moving through a virtual world. Over a finite
period of time, which we'll call t, A moves a certain distance in a certain direction. We'll
represent this movement by a vector V. Also in this virtual world is a circle B that is not
moving. The problem is to figure out if A comes into contact with B while it is moving along V,
and if it does, by what amount do we have to shorten V so that A comes to rest just at the
point of contact with B? An illustration of the problem is shown in figure 2.
The first, and most obvious solution to this would be to use the stationary collision method
described above on the changing destination location of circle A. In other words, move circle A
to the end of the movement vector and then test it's new position for collisions. If A is in
contact with another circle in its new location, you have two choices. You could do a recursive
back off, where you shorten the movement vector and try again until A and B are not touching.
Or you could simply not move A. These methods have several problems.
If you went with the recursive back off solution, you could potentially eat up a lot of CPU time
trying to make the collision appear accurate. Or, you could set a limit on the number or retries,
reducing the computations to an acceptable load but then leaving you potentially with the same
problems with the other option…
You could not move the circle at all. It's really cheap to do, but nothing would ever seem to
really touch. In the game, circle A would move towards a static circle B over several frames
until it's new position intersected with B. Then it would appear to just stop dead some amount
before it would have hit B, as if the static circle B had some kind of invisible force field built
around it. Assuming you are doing collision detection based on frame rate, the effects would be
more noticeable as the frame rate drops.
Finally, if A is moving fast, it is possible that A's final destination is on the other side of B, but
not touching it. You would have to perform additional checks to make sure A did not pass
through any other object.
A better approach would be to use the movement vector V to represent the circle as it moves
through the world. Now all the collision detection is simplified down to a vector tested against a
static circle. This approach does one test to see if V touches B. If it does not collide then you
are done testing.
This solution has problems as well. Consider the situation shown in figure 3. The movement
vector V from the center of A does not come into contact with anything. However, it only
checks the path traveled by the center of A for collision, and the bottom or top could still collide
with B even if the middle does not.
Figure 3
A possible solution to this problem is to test against two more vectors coming from the edges of
the moving circle parallel to the movement vector V. While this may fix the problem described
above, we can see in figure 4 that if we adjust the movement vector V to be the length of the
clipped second vector, the circles will still overlap. Also, if the moving circle is larger than the
static one, the static one might fit between the top and center vectors, or between the center
and bottom vectors, so the collision would not be detected at all. The moving circle would
appear to go right over the smaller static one. Obviously this is not the correct answer for
collision detection.
Figure 4
The first step to the right solution is to quickly determine that there can be no collision and
avoid doing any more tests. So first we figure out if A is going far enough that it could
conceivably hit B. That means that the movement vector must be at least as long as the
shortest distance between the circles, which would be a straight line passing through their
centers. So the movement vector must be at least the distance between the centers of the
circles minus the radius of each. If it is not, then there is no way that the circles will collide.
Note that this test does not take direction into account! Thus it does not tell us that A and B will
collide; it tells us that they won't. See Figure 5 for an illustration.
Our next early escape test is to determine if A is actually moving towards B. If it's not, then
obviously you don't have to worry about them colliding. To do this, we take advantage of our
friend, the Dot Product. First, find C, the vector from the center of A to the center of B.
Remember that a point minus a point is a vector, so C = B - A. Now get the dot product of C
and the movement vector, V: if the result is less than or equal to 0, then A is not moving
towards B, and no more testing needs to be done.
One more escape test: If the closest that A ever gets to B is more than the sum of their radii,
then they do not hit. Dot product to the rescue again! If theta is the angle between any two
vectors P and Q, then the dot product between P and Q is equivalent to:
In other words, the dot product of two vectors P and Q is equal to the cosine of the angle
between them times the length of P and the length of Q.
Also recall that the cosine of an angle is equal to the side of a right triangle adjacent to that
angle, divided by the hypotenuse of that same triangle. Therefore, the dot product of a vector P
and a normalized (ie: has a length of 1) vector Q is equal to the length of P times the cosine
between the two vectors. Which, in turn, is equal to the length of the vector P in the direction
of the normalized vector Q. This is shown in Figure 6.
Figure 6
With this in mind, lets go back to the problem at hand. We have our movement vector V, and
our vector from the center of circle A to the center of circle B, called vector C. We want to find
the point on V that is closest to the center of B. Intuitively, if we were to draw a line from this
point to the center of B, it would be at right angles to V. Therefore, we can use the dot product
as described above to find the distance from the center of A to that point. Compute the
normalized V (call it N) and then take the dot product of N and C. The result will be the floating
point number D, the distance between the center of A and the closest point on V to B. See
Figure 7 for a visual reference.
The length of C and D are the lengths of two sides of a right triangle. Thus, we can use the
Pythagorean Theorem (a^2 + b^2 = c^2) to find the length of the third side, represented in
green in figure 7. Square the length of C, and subtract the square of D from it, and call the
result F.
Now, to be accurate, the square root of F is the length from the center of B to the closest point
to B on V. However, performing square roots take a lot of processor time. So we will perform
our early escape test without taking the square root of F. What we need to know is, as stated
before, do A and B touch when A is at the closest point to B along V? In other words, is sqrt(F)
<= A.radius + B.radius? But rather than take the square root of F, check F <= (A.radius +
B.radius)^2. If this comparison is false, then there is no collision and you can escape the
routine.
At this point we've exhausted our early escape tests, and there is still a chance that the two
circles will collide.
Figure 8 gives a visual explanation of the steps about to be described. The distance circle A can
move before colliding with B is right up until it is just touching the edge of circle B. At that
point, the distance between the centers of the circles is equal to the sum of the radii. Since we
already know the shortest distance from V to the center of B, aka sqrt(F), we have the lengths
of two sides of a right triangle. The third side is equal to D - distance A can travel before it hits
B. So again we can use the Pythagorean theorem, and find the length T = (A.radius +
B.radius)^2 - F. The distance A has to move along V to come into contact with B is D - sqrt(T).
One final check is needed. Remember when we tested the shortest distance between A and B,
but it did not take into account direction? Here's where that can come back and bite us.
Consider the situation illustrated in figure 9. Both arrows are the same length, but in slightly
different directions. This shows that yes, the movement vector is long enough to bring the two
circles close enough to touch, but the direction is such that they won't. So at this point in the
algorithm we need to do a reality check: if Distance is greater than the length of V, then there
is no collision.
Figure 9
If the final test is passed, we can normalize V and then multiply it by D - sqrt(T). Now Circle A
will move until it is just touching circle B. The Java code implementing this algorithm is listed
below.
return true;
This problem seems even more complicated than the previous one. Given two moving circles,
determine whether or not they collide. Looking at the problem in figure 10, we can see that just
because their paths cross does not mean that the circles will come into contact. One may have
moved out of the way in time. It also shows that, just because their paths do not cross does not
mean that they don't collide.
Thankfully, the solution to a very hard problem could not be simpler. What we are really
interested in is not their movement vectors, but their movement relative to each other. If we
translate circle A's movement such that B can be considered stationary, we can use the
Dynamic-Static solution described above!
First of all, subtract the movement vector of one circle from another (or, you can think of it as
adding the reverse of one vector to another). Then perform the dynamic-static collision
algorithm on the circles and the new vector. If they collide, divide the length of the shortened
vector by the length of the one you originally passed into the function. The result should be a
floating-point number between 0 and 1. This represents when over the course of their
movement the circles collided. Multiply the original movement vectors by this number, and the
result is shortened movement vectors that take the circles up to the point where they touch for
the first time.
If that distance is less than the sum of the radii of the two spheres, then they are in contact
with each other.
Collision of a moving sphere with a stationary sphere
The Dynamic-Static collision algorithm works in 3D because the 3D scenario can be reduced to
2D. If you look at our solution for this problem in 2D, you'll notice that it is based around two
vectors (V and C) and a common point (the center of A). These two vectors define the
orientation of the plane in 3D, and the point provides a reference to where in space that plane
lies. Figures 11 and 12 show two spheres, the red one in motion and the blue one at rest. These
figures also show the movement vector of the red one and the vector between the centers of
the spheres. Notice the 2D plane that is passing through the objects in the scene in both
figures. This plane cuts right down the middle of the two spheres and along the two vectors,
clearly showing that these vectors are coplanar. Also notice that the plane cuts the spheres
perfectly in half, so that the areas of contact between the spheres and the plane are circles with
the same center and radius as the spheres. All of the information needed to use the dynamic-
static algorithm we described above is projected into a 2D space on the light blue plane.
There is a special case to be considered, although its resolution is trivial. If the vectors V and C
lie along the same line, then there are an infinite number of planes in which the problem can be
solved. This case should be checked for. Of course, if V and C are co-linear, that means that
sphere A is moving directly towards sphere B, and the question of whether they collide reduces
to only our first early escape test. Namely, "does A move far enough to hit B?", or in other
words, "Is V greater than (C - sumRadii)?"
There are no changes needed to the code above in order for it to work in the 3D case. The
changes that need to be done are in the classes that are assumed by the above code. For
example, instead of the Vector class having only x and y member variables, it should be
changed to include a third member variable, z.
Again, by translation of the problem to one sphere's frame of reference, this problem is reduced
to the Dynamic-Static 3D problem, which in turn scales down to the 2D case, as described
above. Figure 13 shows two spheres, each with it's own movement vector, shown in green. The
orange vector is opposite of B's movement vector, and the yellow movement vector from A (the
red ball) is the movement of A as observed from B's point of reference. The 2D plane
containing the yellow vector also contains the center of sphere B, and so it can be used to solve
the problem.
Now that you have determined that your circles collide, you want to have them bounce off of
each other in a realistic manner, taking into account their relative mass and speed. To solve
this, we are going to rely on some simple laws of physics, specifically the conservation of
momentum and the conservation of energy.
Look at figure 13 to get an idea of the problem and the variables that we are going to use. The
red circle is circle1, the blue one circle2. They each have a movement vector, movevec1 and
movevec2, and a mass, m1 and m2.
Conservation of Momentum states that the total momentum of the system before the collision is
equal to the total momentum in the system after the collision. If we represent the momentum
of a circle to be P = M * V, where M is the circles mass and V is its movement vector, then we
can derive the equation:
where v1'and v2' are the movement vectors of circle 1 and 2 respectively after the collision.
Since the second circle gains any momentum lost by the first, we can represent the difference
between the momentums of the balls before and after by the same vector, deltaP.
Now here is where the difference between reality and simulation comes into play. If these two
spheres were the rubber balls we all used in gym class in high school, when they hit they would
deform. This deformation would increase the area where the balls are touching, and some of
the energy would be lost in that deformation. Other amounts of it would be lost in spin. But in
this simulation, we are assuming the balls to be rigid, frictionless, perfect spheres. A common
real-world example of this type might be the steel balls hanging from a frame that collide with
each other to demonstrate action-reaction; because they are so rigid, very little of their
momentum is lost when they collide, and so when you set one ball swinging it takes some time
for them all to stop.
So in our simulation of perfect rigid spheres, the only transference of momentum can occur
along the single point of contact, as illustrated in figure 13. Therefore, we can break deltaP
into a unit vector N that points down the line of contact, and a scalar P representing the
magnitude of deltaP. So, if we apply this to the equations above, we can solve for the new
movement vectors of the circles and get:
So, if we can solve for P, we can calculate the new movement vectors.
Now look back at figure 13, and notice that v1 and v2 can be represented by the sum of two
vectors: one that is parallel to the line along which momentum is exchanged, and one that is
perpendicular to it. Using this information, we can represent v1, v1', v2, and v2' by:
Where a1, a2, b1, and b2 are scalars, N is the same N as mentioned before, and Q is the
normalized vector perpendicular to the line along which momentum is exchanged and on the
same plane as N and the movement vector.
Substituting v1 in equation 1 for the value of v1 in equation 3, and v2 in equation 2 for the
value of v2 in equation 3, we get:
And since v1' = a1'*N + b1'*Q and v2' = a2'*N + b2'*Q, we can see that
Now we can use the Conservation of Energy to solve for P. The equation for kinetic energy is:
Since energy is conserved, the total energy before the collision must equal the total energy
after the collision:
Using the movement vector as the hypotenuse of a right triangle, we can substitute:
Note that the b1 and b2 terms in equation 7 drop out of the equation. With an equation in
terms of m1, m2, a1, a2, and P, we have an equation with variables that are either given or
can be calculated from what was given, except for P. So if we solve for P, we will be able to
plug in the known variables, derive P, and then use P to calculate the new movement vectors.
Equation 8 shows equation 7 after solving for P.
circle1.setMovementVector(v1');
circle2.setMovementVector(v2');
These techniques will allow you use spheres with a higher degree of accuracy than is probably
necessary if your spheres are all bounding spheres. Precise collisions between spheres become
important when simulating things likes pool balls or marbles, or potentially rocks in a landslide.
But for spheres bounding characters, for example, you might not care what angle two colliding
characters would bounce away at. However, parts of these methods are fast enough to be
useful if only to determine that a collision was avoided. But who knows; using a pumped-up
space marine as a cue ball just might be humorous enough to do…
Special thanks to Dave Baum for his help with collision response.
______________________________________________________
Tiled Terrain
By Mark Peasley
Gamasutra
October 24 , 2001
URL: http://www.gamasutra.com/features/20011024/peasley_01.htm
In this column, I'll examine some of the issues you may want to consider when starting to
create a terrain tile set for a game. When done well, the terrain becomes a believable world in
which your characters come to life. The user will simply accept your work as part of the natural
backdrop. Often, the best compliment for a good environment is that it doesn't call attention to
itself.
As the terrain artist, you may not have a ton of texture memory space to deal with, so it's
always a good idea to figure out how to stretch your budget as far as possible. Obviously, the
larger the number of base texture tiles, the more random (and hence natural) the terrain will
look. When only a few terrain tiles are available, a repetitive pattern becomes very obvious. The
trick is maximizing the usage of each texture while minimizing the disjoint that occurs when you
see the same pattern over and over.
The Basics
Before you begin, you will need to evaluate the needs of your particular project. Let's assume
that you are trying to make one large plain of cooled lava stone. In theory, you can accomplish
this with just one texture map tile and one large polygon. You can tile the texture map multiple
times on the single polygon, which gives the appearance of a higher resolution on the terrain.
While this method works conceptually, it can quickly lead to some technical problems, such as
the inability to add additional texture variety into the field, and shading issues over a large
polygon.
Another method is establishing a grid of polygons similar to a chessboard mesh. You can map
each square with a different texture map, and there is a lot more flexibility with what you can
do on an individual square basis. This method addresses some of the variety and shading issues
as well as making the process of editing and tweaking much easier.
Now, suppose you want to add a lake of molten magma to the middle of the landscape. With
the single large polygon method, you'll quickly face some problems. Since you can't interrupt
the tiled terrain and insert a different texture map in the middle of the sequence, you have to
figure out an alternative method. One such alternative is to include the lake in one large texture
map. While this would allow you to create a completely custom terrain, you will quickly
encounter texture memory usage and texture resolution challenges.
The other alternative is to make a custom polygon for the magma only, and then perform a
Boolean operation on that polygon, combining it with the other terrain polygon. This too
presents some particular problems if you are using vertex shading to light your terrain. In
addition, making the seam between the two texture types invisible will prove to be difficult, if
not impossible.
For this tutorial, I'll focus on a more forgiving method of maximizing the effect of your tiled
terrain by using a standard grid and texture-mapping each polygon quad or set of quads as a
unique element. This method will allow you to control the layout of the textures much more
accurately, and it also provides vertex-shading advantages. In addition, when you start to add
vertical information to your mesh, you will find that the additional vertices give you more
control of the 3D aspects.
When making a texture tile set, you'll need to create a minimum set of tiles. The complexity of
the planned environment will determine the number of tiles, how well the tiles appear visually
without showing repetitive patterns, and how many different types of random tiles will be
required. The amount of available memory for the terrain will impact all these factors. As a
good starting point, create the minimum base set, then add to it only when necessary. Reuse is
king, so try to stretch your texture budget as far as possible.
The number of different terrain types you will have in your environment will be another big
impact on your tile set. This impact can be compounded by which terrain types can touch or
transition with one another. Take, for example, a terrain that consists of water, sand, and
grass. If water can only touch sand and sand can only touch grass, then you only have two
transition sets to make. However, if water touches grass as well, then an additional transition
set is required. As you can tell, planning out the environmental requirements ahead of time
makes a lot of sense. You may find areas where you need to limit the number of transitions in
order to keep the texture budget in check.
For the sake of argument and to keep the number of variations low, I'll assume that the new
terrain will consist of two basic ground types: stone (cooled lava) and molten lava. This will
require you to make only one transition set. For now, don't concern yourself with the 3D
aspects, and assume that the terrain is a flat plain.
First, create the flood-filled texture for each basic ground type. This is the basic tile that is used
as the default flood fill of an area. If possible, since it stretches your texture budget, it's a good
idea to make the texture able to tile in any direction, even if it's rotated 90 or 180 degrees. This
will give you more visual mileage out of a single texture by breaking up the repetitive pattern of
the texture when you lay it down next to itself.
In Figure 1, you can see the stone texture in its original orientation. It's easy enough to use the
Offset filter and Rubber Stamp tool in Photoshop to make a texture seamless, but you're limited
to only one orientation. If you rotate it and try to place it in a grid, the seam will be plainly
visible. In order to make a tile seamless when it rotates, the texture needs to have identical
edge pixels on each of the four sides. This requires a bit more work than just using the Offset
filter, but it also extends the usability of the texture. With a bit of Photoshop magic, you can
create the seamless effect pretty easily.
The first step in making a tileable texture is making your flood-filled texture tileable. There are
quite a few methods and some programs devoted exclusively to this process. Photoshop is more
than adequate for the job, so I'll cover a method that doesn't require any special plug-ins.
After obtaining the proper source material, choose a square section of the image. Avoid source
materials that have a strong light direction embedded in the image. This is mainly because once
you rotate and place the texture next to the original orientation, there is a visible anomaly in
the lighting direction. If you take care, you can make strongly highlighted textures work, but
that takes a bit more Photoshop work.
In this tutorial, I'll actually use a source that has a fairly strong highlight to make it easier to
see the seams. Be aware of the image scale once it is mapped onto the terrain. It's very easy to
find games out there where the texture maps aren't to the proper scale for the characters or
the environment. Often, designers use the character texture maps to drive the texture
resolution for the rest of the environment. Your main goal is uniformity in your texture
resolution. The pixels per foot of your game should remain fairly consistent.
Once you have selected your perfect 256x256 texture, it's relatively easy to make it a seamless
texture when no rotation is involved. To do this, load up the texture in Photoshop, then go to
Filter>Other>Offset. Set the Horizontal and Vertical settings to 128 with the Undefined Area set
to Wrap Around. The seam is visible, and you can then remove it using the Rubber Stamp tool.
Try to keep from blurring the image too much, as it tends to make the end result look fuzzy.
It's also a good idea to change the offsets to different settings such as 64 vertical, 64 horizontal
and then check the seam again for visible anomalies.
Once you are done, check to see how it looks in a tiled environment. First, make a new
Photoshop file, with the size set to 1024x768. Now open up the tiled texture you have just
created and select all. Go to Edit>Define Pattern. This stores the texture as a repetitive pattern
in the clipboard. Now switch to the 1024x768 image and click on the Paint Bucket tool. Go to
the Paint Bucket Options panel and select Pattern from the Contents pull-down menu. Then,
simply click anywhere in the image to fill the area with the pattern.
This process gives you a good idea of how the image will tile and whether there are any areas
that tend to stand out in the repetitive pattern. Fix any obvious problems immediately rather
than waiting until later. You'll have to redo most of the steps after this one if the edges don't
quite work.
After you create the first flood-fill tile, you can quickly make additional matching tiles. You may
find that it's necessary to create three or four different flood-fill patterns to make the terrain
random enough.
To create additional fills, you will need more source images. In this case, the original texture
was a fractal pattern generated in Corel Draw's texture program, so by simply changing the
seed number, you'll be able to make additional source material images all with a very similar
look and scale.
Once you obtain the new source image, create a new Photoshop file that has the new terrain as
the first layer (I named mine New Stone) and the tiled texture you previously made as the top
layer. Now, with the top Original Stone layer, make a selection box that is approximately 30
pixels from each edge. Accuracy isn't critical here, since you will be able to edit your work after
the fact. With the selection still active, go to your Layers panel, and select the Add Layer Mask
icon on the bottom left. This creates an active layer mask linked to your New Stone layer, with
the edge border showing the Original Stone layer.
Now it's time to do some Photoshop work and edit the mask to blend the New Stone edge into
the Original Stone edge. Don't edit all the way up to the edge. If you do, your new texture
won't match the old texture at the seam. Once you complete this task to your satisfaction, save
off a copy and collapse it down to create your new tile. By inserting it into your 1024x768 tile
sample, you will be able to see how it looks and how effectively it breaks up your patterns.
After examining the results of the two or three flood-fill textures, you'll begin to recognize the
pattern unless you've used several flood fills. Another way to extend your texture budget is to
use a common edge on all sides. This method allows you to rotate the texture 90 degrees in
any direction and it will still match. This technique also makes it much more difficult for the
viewer to detect the pattern and requires a bit more Photoshop expertise to create.
First, pick a side of the texture map that will become your common edge. In the case of the
stone texture example, I'll pick the left edge. Create a selection box that is flush with the top,
bottom, and left side of the texture map, and about 30 or 40 pixels wide (see Figure 2). Now
copy that piece of bitmap into the clipboard and paste it down again (Control-C then Control-V).
This will place an exact duplicate of the texture section right over the original. Now make a
selection set of the layer you just placed down by pressing Control and clicking on the layer in
the Layer panel. Rotate the selection 90 degrees clockwise by right-clicking on the selection set,
then choosing Numeric Transform and inputting 90 in the Angle section under Rotate. Now click
and move the selection set so that it aligns with the top of the texture and name it Top. Add a
layer mask, and edit the inside edge of the bitmap to blend it into the underlying texture map.
If you paste again (Control-V), you will get a new layer with the original left-hand side segment
that was residing in the clipboard. Right-click on the segment, and select Free Transform.
Right-click once again, and select Flip Horizontal. Now press Enter to lock the transform into the
bitmap. Align with the right-hand side of the texture map, add a layer mask, and blend the
inside edge to the rest of the underlying texture. Be careful not to remove the corner pixels.
Rename the layer Right. Now select the Top layer and drag it into the Create New Layer icon on
the Layers panel. This will create a duplicate of the Top layer. Rename it Bottom, then select
and apply Free Transform. Choose Flip Horizontal, and you're done with the needed transforms.
Now align it to the bottom edge and add or subtract from the layer mask layer to blend it into
the underlying texture. Once you have done all three sides, you should have a rotatable texture
with a seamless edge. Figure 3 shows the blended edges without the underlying texture for
clarity.
It is a good idea to test your new texture by creating a duplicate of the file and collapsing it
down to one layer. Then run the Filter>Offset on it with 128 horizontal, 128 vertical and Wrap
Around in the settings. This will make the nonrotated seams visible if there are any anomalies.
Make sure either to undo or to run the Offset filter on it again to return the texture to the
rotatable seam edge.
Now, select the texture and create it as a fill pattern. Create a new 1024x768 file and fill with
the repeated pattern. Set the guidelines or grids on the 256-pixel boundaries and then paste a
single texture into one of the grids as a new layer. You can now rotate the new layer 90
degrees and align it with the grids or guidelines. If you have done the steps correctly, it should
blend in with the seamed edge without a visible line (see Figure 4).
In order to create a blended set, we will also make a molten lava flood-fill texture, utilizing the
same techniques I've just listed. The base lava texture was generated in Bryce 4, then run
through Photoshop for a bit of image enhancement (see Figure 5). Once you have a tileable,
rotatable lava tile, you are ready for the next step.
At the most basic level, there are only three additional textures needed to create a blended set.
The entire set contains five tiles: two flood fills and three transitions. For clarity's sake, I'll refer
to them as the one-fourth blend, the one-half blend and the three-fourths blend. With these
and a tiled environment, you can create any sort of varied coastline required. However, it's
good to note that making three or four variants of each of these textures will make the
transitions from one texture to another much less visible and more natural-looking.
First, make the one-half blend tile. To create this tile, place the stone texture on Layer 1 or the
Background layer in a new file you've created in Photoshop. Next, open and copy the lava
texture into the clipboard. Paste the lava into the new file as a second layer over the stone.
The next step is either to add guidelines or to set your grid to show you where the 128-pixel
point is on the side of the texture. Once you do this, select the top half of the texture, which
should be a 256x128 selection box. On the layer mask, fill the selection with black to make it
transparent and hide the texture under the selection. Now it's time to go into the texture and
add some randomness. In the case of the stone texture, I tried to follow some of the natural
contours of the rock. By alternating between black and white on the layer mask, I was able to
add or subtract stone to the composite image as needed (see Figure 6).
Avoid eliminating or altering the pixels at the very edge of the texture on either side. If you do,
you won't have a tiled texture anymore. Also, try not to get too dramatic in the uniqueness of
the transition. If, for instance, you decide that a big pool of lava would look good right in the
middle of the transition, you will find that the distinctness of this element becomes apparent
when the tile is repeated. If you are using multiple variations, then a unique tile every so often
works quite well. The base repeating tile should be somewhat generic so that it doesn't bring
attention to itself.
For the one-fourth blend and the three-fourths blend, take the exact same steps using the layer
mask, editing only one quarter of the texture. The quadrant you choose is arbitrary, since the
texture is rotated to allow for all four directions. Figures 7 and 8 show the final edit on the
textures.
Now that you've created the base set, you will find that no matter how well you created the
texture, the repetitive nature of the tiles is difficult to escape. If memory allows, you can create
variants for each of these base sets. You can also create unique "random" tiles. These are
special-case tiles or groups of tiles that occur very sporadically (to minimize their recognition)
but give the terrain a more natural look (see Figure 9). Try to create variants that can also be
reused to maximize your texture usage, such as a set of three or four variants that all work
together and allow you to mix and match them.
Once you have your entire set, you can always build another test screen that checks for any
problems in the tiled layout. Without the use of multiple transition types, the terrain isn't as
natural as it could be, but Figure 10 is a good example of a base set.
Figure 10. An example of the
On the Horizon
The methods I've just covered are just one technique for terrain tile generation. Once you've
mastered the basics, you may find that there are things you can do to enhance your textures
even more. If your game requires the use of a graphics card, then you have some powerful
animation possibilities at your disposal if you have access to the hardware texture calls. You will
need to talk over the technical requirements with your graphics programmer, but you can easily
achieve such effects as pulsing lava, flowing water, and moving steam with animated textures.
The base requirement is that the game use a graphics card, but with the latest-generation
games, this is quickly becoming an expected base system requirement.
http://www.gamasutra.com/features/20011024/peasley_01.htm
By Gustavo Oliveira
Gamasutra
October 5, 2001
URL: http://www.gamasutra.com/20011005/oliveira_01.htm
The first time I implemented a spring model, it fascinated me for hours. The effect itself is
amazingly realistic, and its implementation is fairly simple. Fortunately, I found a lot of articles,
references, and source code to help with my research. Nevertheless, as I went further down the
road, I noticed that in most cases these references limited to the standard applications of spring
models -- string, cloth and jelly.
This article reviews the implementation of a spring model from its simplest form to more
sophisticated applications, taking the subject a step beyond the material available in most
references.
Spring Basics
Before modeling springs with the computer, you need to understand the basic principles of
springs from your classic physics book.
As you compress or extend a spring, it creates a force that is opposed to direction fo the force
that you are applying. This force is mathematically equated by the formula:
F = -k*∆x
∆x = xc-xi
Where F is the resultant force, k is the spring coefficient, and ∆x the distance from its inertial
position (xc = current distance, and xi = distance at the inertial position).
Mathematically, you can think of a spring as being two points separated from a distance x. This
is its inertial position. In other words, if you don't move these points, no force will be generated
because ∆x = 0. If you try to compress the distance between these points the ∆x will be
negative, generating a positive force (expansion). If you try to separate these points the ∆x will
be positive, generating a negative force (compression). The picture below illustrates these
cases.
Pa -/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\- Pb (∆x = 0, F = 0)
Pa -/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\- Pb (∆x < 0, F > 0)
Pa -/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\- Pb (∆x > 0, F < 0)
Implement the spring force is a direct application of the first two equations. In other words, first
compute the distance of the spring's extremities (Pa and Pb). This is done through Pythagoras'
theorem. Once you have the distance, you subtract from the inertial distance getting the ∆x.
Then, multiply this scalar value to the k coefficient and finally use the inverse of this value to
compute the force.
As simple as it sounds, the implementation in C code has its details. First, the distance ∆x, and
the k coefficient are scalars, but the force itself is a vector. Therefore, it requires a direction,
which can be from Pa to Pb, or from Pb to Pb. As of right now, the code below always use the
direction Pb to Pa, or mathematically Fdir = Pb-Pa.
As the force's direction is normalized, you can use the distance you already computed for the ∆x
for the normalization. You must be careful is to avoid division by zero when normalizing the
vector. This is a nasty condition and it usually does not happen, but you have to do something
about it in case it does. You have two options in this particular case: assign a random force to
the points, or null out the force. Theoretically, the force should be big enough that as the points
get closer they will never occupy the same space. Therefore, it seems to be a good option to
randomly assign a massive force in this case. However, this can cause undesirable artifacts,
because some points inside the spring model might fly a part with extreme velocity when
compared with others. Although nulling out seems wrong, it doesn't cause any major problems
and preserves the spring's stability. The code below shows the implementation; it's a little
redundant for sake of clarity. I null out the force in case of potential division by zero.
return;
}
force_dir.x = dx;
force_dir.y = dy;
force_dir.z = dz;
//normalize
force_dir.x /= distance;
force_dir.y /= distance;
force_dir.z /= distance;
//force
delta = distance-delta_distance;
intensity = k*delta;
//store
force->x = force_dir.x*intensity;
force->y = force_dir.y*intensity;
force->z = force_dir.z*intensity;
}
Connecting Springs
Computing the spring force in one direction for a single spring doesn't do much. The power of
springs comes when you connect them into a network. Then, the computed force of one spring
affects the entire network. In other words, if one point moves, it creates a force that
propagates through the connected springs. This results into a realistic movement of connected
points simulating what would happen in real life.
The three common applications of connected springs are string, cloth, and jelly. These models
are very similar. The only difference is their initial geometric arrangement and how their springs
are connected. For instance, to model a string you connect springs as a line (one dimension).
To model cloth you connect springs as a plane (two dimensions). Finally, to model jelly you
connect springs as a volume (three dimensions).
Before implementing these applications, take a closer look of how two springs can be
connected. Figure 1 shows two springs represented by two think colored lines, with the
extremities represented by three points. The forces that can be created between these points
are shown with arrows.
From Figure1, two connected springs can generate four forces. For example, if you move and
hold P0 the left, the force P0->P1 will be created to pull P0 back closer to P1. Also, the force
P1->P0 will be created to move P1 closer to P0. As P1 is also connected to P2, the forces P1-
>P2 and P2->P1 will be also created to approximate these points. If you had more points, the
forces will keep propagating through the springs because of its connections.
You don't have to necessarily follow the arrangement of Figure 1. For example Figure 2 shows
four springs connected together in a different manner. Again, the thick colored lines represent
the springs, the points its extremities, and the arrows the forces that can be created between
them.
Figure 2: Forces between Four
Springs
In this arrangement there are four possible acting forces on P1 derived from P0, P2, P3 and P4.
Also, you don't necessarily have to connect a point to its neighbors as in the previous figures.
For instance you can add another spring connecting P0 to P2. Or even add more than one
spring connecting two points. In fact, they way you build your spring network is entirely up to
you. The arrangement you choose, however, will affect your final spring model behavior.
Based on this idea, the figures below show ways to connect springs for a string, a cloth and a
jelly respectively. In each arrangement the edges represents one spring. The blue dots are the
extremities of the springs.
Figures 3-5: String Arrangement,
Cloth Arrangement, Jelly
Arrangement*
In each configuration, if you move one point it will realistically simulate the object's behavior.
However, this is not entirely true for the particular jelly arrangement presented. Indeed the
arrangement of Figure 5 is unstable. If you model a jelly exactly like in Figure 5, moving one of
its points will morph the jelly into a different object. Soon the original cubical shape will be be
gone. Why? Because the spring force formula requires the scalar ∆x, and its direction is
irrelevant. Therefore, if you simply rotate two connected springs, they can move to another
configuration where no forces will be created. Figure 6 and Figure 7 below clarify the last
statement by showing two sets of springs that do not create any forces. In both cases the ∆x =
0, but their geometric shape are different.
Because the springs can rotate around their extremities without creating any forces, you must
be careful when modeling and connecting your springs. This is probably the most important part
of the implementation. You must design your spring model to be able to swing nicely without
morphing it into something unrecognizable. Also, your design should use the least number of
springs for the sake of calculation time. Figure 8 composes a stable arrangement for a jelly
model. In this arrangement there are a lot more springs than the previous one, and some
vertices connect up to six springs. This is still not the only way of connecting them to create a
stable model. For example you can stack the cloth model and connect all the planes with more
springs. This configuration will be computationally more intense, and it will behave differently
than the arrangement of Figure 8. Again, the final arrangement is up to you, and it depends on
what you are trying to simulate.
Implementing a String
With the basic information already presented, let's look in how to implement the simplest spring
model --a string. Despite of its simplicity, once you understand how to implement a string,
other models become just an extension to the same basic idea.
To begin, you need a data structure that allows you to save the spring's position, compute its
forces, and enable connection to other springs. Therefore, it is easier to design your data for
the spring's extremities instead of each spring itself. In this implementation, each extremity of
a spring is called a spring node. Two spring nodes compose one spring. As a polygonal mesh
has a list of vertices, the spring model will have a list of spring nodes. The code below shows
the basic data structure for each spring node, and the helper data structure VECTOR.
} VECTOR;
SPRING_NODE* neighbor_node[SPRING_MAX_NEIGHBOR_COUNT];
float neighbor_x[SPRING_MAX_NEIGHBOR_COUNT]; //distances
int neighbor_count;
} SPRING_NODE;
The spring node data structure above holds the current position, velocity, resultant force and
pointers to the neighbor nodes. It also saves the distance x from the neighbor nodes at the
inertial state -- necessary to compute the resultant force. Yet, Listing 2 is a little nasty. Like
Listing 1, it was written that way simply for educational purposes. For instance, this data
structure is used for build time only. Therefore, to save some memory, it would be better to use
linked lists for the neighbor pointers and distances instead of arrays.
In a nutshell, the spring model is simply a list (array) of spring nodes. And to build a string (or
any spring model) all you have to do is to set up the spring model initial geometric
arrangement, and connect its nodes properly. Then you are ready to animate it, by computing
the internal forces and applying to the points.
Before you see the animation loop details, take a closer look into the spring model data
structure, followed by the string creation code. Listing 3 depicts the main data structures used
to build any spring model. As you see it is nothing more than a collection of nodes, and extra
fields for forces calculation, and debugging.
Listing 4 is the actual setup code for a string. It has four main parts. First, it allocates memory
for the string data and assigns the main parameters to it. Second, it positions the nodes aligned
to the xz positive axis. Third, connects the nodes and assigns the inertial distances for the
nodes and its neighbors. The last part is where the spring model behavior is defined. This part
of Listing 4 is fairly simple, and (hopefully!) easy to read. That's because it models a string, and
the implementation takes advantage of the string's symmetry. The last part of Listing 4 is
setting up the anchor point. This is a point that will not animate no matter how strong the
forces are on it. The anchor points are used for the debugging and testing. To drag model
around, you need at least point. In some models, as you will see later, the anchor points are
useful to keep the spring model stable.
} SPRING_ANCHOR;
float k_coef;
float energy_loss;
float mass;
} SPRING;
/////////////////////////////////////////////////////////////
//allocate memory & setup parameters
/////////////////////////////////////////////////////////////
int mem_size;
//adjust pointers
spring->k_coef = k_coef;
spring->energy_loss = energy_loss;
spring->mass = mass;
//////////////////////////////////////////////////////////////////////
//setup nodes position aligned with the xz+ axis
//////////////////////////////////////////////////////////////////////
SPRING_NODE* node_curr;
int tmp0;
VECTOR pos_curr = {0.0f, 0.0f, 0.0f, 0.0f};
///////////////////////////////////////////////////////////////////////
//connect nodes and setup the inertial x distances
///////////////////////////////////////////////////////////////////////
//inner nodes
for (tmp0=1; tmp0<num_nodes-1; tmp0++)
{
node_curr = &spring->node[tmp0];
node_curr->neighbor_count += 2;
}
////////////////////////////////
// setup anchor node
////////////////////////////////
return (spring);
}
Once the geometry is ready and the nodes are connected, the next step is to implement the
animation loop. Listing 5 illustrates the generic loop. Again, this code is redundant, and not
optimized, for sake of clarity. The basic idea is to loop through all the nodes, compute all the
forces on each node, and store the resultant force. Then, the resultant is used to add velocity to
the node's position. Finally, the anchor points are "fixed" by not letting the forces act on them.
To compute the final resultant force on each node, the neighbor nodes come into play. In
Listing 6 for each node you compute the forces that all neighbor nodes apply to it. Then, you
accumulate (add) all these forces and store it in the data structure. Do you remember in Listing
1 when only one direction was necessary to compute the force? This is because as you are
looping through all nodes, and looping through all its neighbors as well. Therefore, at the end
you will be computing both directions of the same force. In other words, when P0 is the current
node, the force P1->P0 is computed. Then, when the current node is P1, the force P0->P1 is
computed. As you are computing the same force (with opposite signs) twice, this can be
optimized later. As of right now, to keep the loop easier to read, the code does it the slow way.
source = &node_curr->pos;
SpringComputeSingleForce (&force_tmp,
source,
dest,
spring->k_coef,
distance);
node_curr->force = resultant;
}
}
To finally animate the spring model you simply apply the resultant force of each to node to
itself, making them to move. From your physics book, force and acceleration are related as
following.
F = M*Acc
Acc = F/M
As the first equation shows, to compute the node's acceleration you divide the resultant force
by the spring's mass. If the force is not null, there will be an acceleration value that is used to
increase the velocity of the node. Finally the velocity will change the node's position each
frame. Listing 7 shows the implementation of all that.
There are few important observations about Listing 7. To begin, if you remove the spring's
mass value from the code it will not make much of a difference. Although to do the extra divide
is mathematically correct, but if you simply assign the force to the acceleration value you will
get fine results. This is equivalent of having set the mass of all springs equals to one. In the
code you see the line with the extra division commented out.
Another important observation is that the code adds gravity to the final resultant force --
implemented by the global variable "gSpringGravity." This is simply an additional global force
that affects all nodes making the string fall. You can take this out, or even add more of these
global forces for different results.
The last important observation is in regard to the calculation of the node's velocity. The
acceleration value adds velocity to the node, but you need to dampen this velocity somehow
each frame. If you don't, your spring model will swing forever. The "energy_loss" at the end of
Listing 7 is used to dump the node's current velocity. It is interesting to play around with the
"energy_loss" variable. For example, if you set it to 1.0, you will get perpetual motion. If you
set this variable slightly above 1.0, on every frame the system will gain energy. Eventually your
spring model will be completely chaotic and unrecognizable. Realistic values would be between
0.0 (0 percent) to 1.0 (100 percent), but non inclusive. This will simulate real life energy loss of
moving systems due to attrition, heat and/or noise.
When I first saw my string swinging beautifully on my screen, I had not included my
"energy_loss" variable to the code. For a second I truly thought I had invented a way to create
perpetual motion. However, I had forgotten that my computer was plugged to the outlet.
Therefore, keeping my pixels moving forever would cost me infinite energy!
int tmp0;
SPRING_NODE* node_curr;
VECTOR acc;
node_curr = &spring->node[tmp0];
//acc.y = (node_curr->force.y+gSpringGravity.y)/spring-
>mass;
acc.y = (node_curr->force.y+gSpringGravity.y);
//acc.z = (node_curr->force.z+gSpringGravity.z)/spring-
>mass;
acc.z = (node_curr->force.z+gSpringGravity.z);
node_curr->vel.x += acc.x;
node_curr->vel.y += acc.y;
node_curr->vel.z += acc.z;
node_curr->pos.x += node_curr->vel.x;
node_curr->pos.y += node_curr->vel.y;
node_curr->pos.z += node_curr->vel.z;
Once the string starts to swing, the last part of the code just checks to see if the anchor point
has moved from its initial position. If is has, the code forces it back to the original location. In
the case of the string, the anchor point was set be one the strings extremities. This works like
you had nailed the string's extremity to a wall; one end will never move. If you hook up the
keyboard or mouse to the anchor point position you can move the string around to see how it
behaves. Listing 8 illustrates this simple check.
if (node_curr->pos.x != spring->anchor[tmp0].pos.x)
{
node_curr->pos.x = spring->anchor[tmp0].pos.x;
}
if (node_curr->pos.y != spring->anchor[tmp0].pos.y)
{
node_curr->pos.y = spring->anchor[tmp0].pos.y;
}
if (node_curr->pos.z != spring->anchor[tmp0].pos.z)
{
node_curr->pos.z = spring->anchor[tmp0].pos.z;
}
}
}
Lastly, to draw the string simply hook up a render function that draws lines using the current
position of the string nodes. Listing 9 shows the render function that does that, and also draws
small crosses at each node for reference. Figures 9, 10 and 11 and 12 shows screen shots of
the string movement with some gravity value.
SPRING_NODE* node_a;
SPRING_NODE* node_b;
int tmp0;
RGBA rgba;
return (SPRING_OK);
}
Figures 9-12: Animation of the
string.
If you understood the basic idea of the string's algorithm, implementing cloth and jelly is
literally the exact same thing. You can simply use the same code you have written to simulate a
string for more complex objects. The only part you need to change is the creation function
(Listing 4). Then you will have to arrange the position of the nodes and the links to its
neighbors to fit the cloth and jelly shapes.
As you may have notice, Listing 4 was fairly simple because a string is a simple object to setup.
The cloth is not too bad to setup because it is also a very symmetric object. Yet, compared to
the string, the code will be a lot more complicated. For the cloth you are going to have to setup
the nodes position's as a square shape. Then, when connecting the nodes, you need to be
careful with the corners, left, right, top and bottom edges and the inner parts of the cloth.
And how about the jelly? It is possible, algorithmically, to setup the jelly like a stack of cloth.
Yet, the same setup code is now even more complicated than the cloth's code. First, you need
to setup the node's initial position as a cubic shape. Then, when connecting the nodes you need
to be careful with the corners, front, back, left, right, top and bottom planes and the inner parts
of the jelly. As you can see, in this case you may have to spend a lot more time on the jelly's
creation code than the physics itself. To make matters worse, if you try to model algorithmically
a jelly like the one in Figure 8, the code is going to get even more tangled up. In this particular
case, you are better offer to get a graphic paper, and write out the vertices positions and links
and setup the node's array one by one!
Fortunately, there is a better ways to setup the cloth, jelly and more complex spring models.
The main idea is to modify the creation's code for a more data driven procedure.
To be able to setup the springs in a data driven way suggests, at first, the need of a tool.
However, you will probably have to spend a considerable amount of time and resources to
implement this tool, especially if you want to design complex 3d spring models.
Another way is to write a converter. It turns out that by simply converting geometry data from
a modeler tool (3D Max, Maya, Houdini, Softimage, etc.) to spring data, you can still get good
results.
The converter can save you a lot of time, and has few advantages. First, you are not limited to
symmetric geometry. In other words, the spring model is now only limited by your imagination.
Second, you can get the texture information for your spring model for free (more about this
later). Third, as a videogame programmer, you probably already have the geometry importer
code available. Therefore, all you have to do is to modify it to create springs. Fourth, you do not
have to worry about any fast algorithm during the conversion process. Once you build your
spring configuration, simply save out the spring model. Then, just reload it later on the game.
The converter can be implemented by simply considering each vertex of the geometry vertex
list a spring node. Then, each edge connecting two vertices becomes a spring. The basic
algorithm is shown next.
Although writing a converter offers several advantages, it does have a big disadvantage. Your
spring model now is hooked to the geometry data, and this is not necessarily a good way to
setup your springs. Most of 3D tools export geometry as a list of triangles. Therefore, your
springs will always have its inner cells connected as triangles. Figure 12 depicts the jelly model
arrangement converted from geometry data.
When perturbing and applying the spring forces to this jelly arrangement, it will morph into
something unrecognizable. In other words, this arrangement is unstable. Now, if writing a tool
is too much work, but converted geometry data does not work, what can you do? In a nutshell,
you can do the gross arrangement of the spring model through the converter, and manually
add or remove springs to fine tune your model. If you compare Figure 12 with Figure 8 you will
realize that they are similar. The only difference is few inner springs connecting the bottom and
top parts of the jelly.
To fine tune your springs you can write a simple tool that let you attach or delete springs from
your current spring model. This tool is far less complicated than writing a spring modeler from
scratch. Figure 13 shows the tool used to convert and edit the springs of the jelly in Figure 12
to the one in Figure 8.
The conversion tool in figure 13 is simple mesh viewer with few extra buttons for spring editing.
The interface allows the user loop through the spring nodes, add or delete springs, add anchor
nodes, test the model, and tune the physics' parameters. Despite this tool interface's simplicity,
it allows the user to convert, tune and test the spring model in a data driven way.
So far all you have seen in this article are wireframe examples, but to texture them is trivial.
When converting the geometry data to a spring model, the texture information comes with no
effort. Simply move it from your original mesh data to the spring data structure. Also, you need
to move the face data structure. In other words, to draw each triangle you need to know which
nodes compose each face. Listing 10 illustrates the basic modifications to the main data
structure to include texture information.
int t0;
int t1;
int t2; //texture index
} SPRING_FACE;
typedef struct _spring_tvert
{
float u;
float v;
} SPRING_TVERT;
float k_coef;
float energy_loss;
float mass;
} SPRING;
Figures 14, 15, 16 and 17 show a textured cloth model and four animation frames. The cloth
was build by the conversion tool. Note that the red edges were deleted springs, since the
original geometry does not create a good spring structure for a cloth. Four anchor points were
set in a shape of a square around the center of the cloth.
Figures 14-17: Textured Cloth
Dynamic lighting is more complicated since your spring model is constantly morphing. In this
case you must compute the vertex normals on the fly for each frame, and there is no way
around it. The standard way is to setup pointers to neighbor faces in build time. Then, in real
time, compute and average the neighbor face normals to get each vertex normal.
With a data driven spring system, you can pretty much model and design anything you want to
swing. You can build balloons, trees, tires, water, hair, tents, and so on. Even the standard
applications can be enhanced, like a flag with a flexible pole for example.
Another interesting effect is to mix strings with cloth or jelly like objects. You can create a cool
balloon by connecting a string to a spherical jelly. Another example would be, for a F1 game,
modeling the finish line cloth with a cloth connected by strings to the poles. As the cars cross
the line, apply the forces due to the air movement to the spring model.
Figure 18 shows the screen shot of this article's executable. There you will find few spring
models. It has a string, a cloth, a jelly, and a balloon. Except for the string, they were modeled
with the conversion tool.
Optimizations
Spring models rely heavily on square roots, since the distance between two points is computed
several times for each frame. Luckily, newer computers can handle these computations very
easily for simple spring models. Also, the new consoles on the market have sets of assembler
instructions to perform square roots and normalizations in single instructions by hardware. For
instance, if you are a Playstation 2 developer, Listing 1 can be easily optimized using VU0
instructions.
At a higher level, the way this article's code computes the forces is redundant. If you look
carefully at the code, you are always computing the same force in two opposite directions.
Therefore, it makes sense to compute the force in one direction and simply negate it to get its
opposite. By doing this way you are potentially cutting the inner calculations is almost half.
Sample Program
The sample program demonstrates the standard applications (string, cloth and jelly) and a
more complex object (balloon). Play around with these models and see the spring's behavior in
each object. Use the interface buttons or keys I, J, K, M, Y and U move the objects. Also, the
mouse (press left or right button), arrows, key A and key Z, move the camera.
The source code for the sample program is not available. Only the string implementation is
available but with no render functions and no optimizations. The sample code is just to guide
you on your implementation.
Final Comments
The more you play with springs the more ideas you have for things you can model with them.
Objects modeled with springs can add more realism to your game. Yet, there are not many
games out exploring this neat effect.
Next time you would like to write an interesting effect for your game, see if springs can give
you new ideas. Like the famous Duke Ellington's standard says, "It Don't Mean a Thing If It
Ain't Got That Swing."
References
[1] http://freespace.virgin.net/hugo.elias/
[2] Serway. Physics For Scientists and Enginneers, Fourth Edition, pg 364-370
[3] Watt, Alan, and Fabio Policarpo. The Computer Image, Addison-Wesley, 1997.
[4] Harris, John W., and Horst Stocker. Handbook of Mathematics and Computational Science,
Springer Verlag, 1998
[5] Donald Hearn M. Pauline Baker. Computer Graphics, C Version, 2nd ed.
[6] Direct X 8 SDK Help. Microsoft Corporation
[7] Playstation 2 VU User's Manual, 2nd Edition. Sony Computer Entertainment Inc.
________________________________________________________
Editing
URL: http://www.gamasutra.com/20011003/boyd_01.htm
Most current implementations of fog in games use layered alpha images. This technique,
however, does not bare significantly resemblance to how fog actually composites in real life,
since density of the fog from the viewer is not modeled in any way.
In order to create fog effects in a game, it is first necessary to create an analytical model that
bears some resemblance to the mechanics of real fog. Fog is a cloud of water vapor consisting
of millions of tiny particles floating in space. Incoming light is scattered and emitted back into
the scene. This model is too complex to render in real time, and so a few assumptions and
restrictions must be made. The following is a similar model to what is used in depth fog.
The first and most important assumption, common to many real time fog implementations, is
that incoming light at each particle is constant. That is, one particle of fog located at one end of
a fog volume and another particle are receiving the same amount of incoming light.
The next related assumption is that each particle of fog emits the same amount of light, and in
all directions. This, of course, implies that a fog's density remains fixed. These two assumptions
mean that, given a spherical volume of fog, equal light is emitted in all directions.
Using these assumptions, a model of fog can be defined. If a ray is cast back from a pinhole
camera through the scene - the amount of fog that contributes to the color of that ray is the
sum of all the light along the ray's path. In other words, the amount of contributing light is
equal to the area of fog between the camera and the point in the screen. The light of the
incoming ray, however, was partially absorbed by the fog itself, thus reducing its intensity.
So, the proposed model of fog is (done for each color channel):
If this is a per pixel operation, then the incoming light is already computed by rendering the
scene as it would normally appear. An analytical way of thinking of this problem is: The amount
a pixel changes to the fog color is proportional to the amount of fog between the camera and
the pixel. This, of course, is the same model that is used in distance fog. Thus, the problem is
reduced to determining the amount of fog between the camera and the pixel being rendered.
Standard Depth fog uses the Z (or w) value as the density of fog. This works well, but limits the
model to omnipresent fog. That is, the camera is always in fog, and there is (Save a definable
sphere around the camera) an even amount of fog at all points in the scene.
Of course, this does not work well (or at all) for such effects as a ground fog, and this technique
cannot be used for interesting volumetric lighting.
An alternative way to create fog is to model a polygonal hull that represents the fog, and to
compute the area of fog for each pixel rendered on the scene. At first glance, this seams
impossibly complex. Computing volume area typically involves complex integration.
However, the shaft of fog along a ray can be closely approximated subtracting the w depth a
ray enters a fog volume from the w depth of the point it leaves the volume, and multiplying by
some constant. (Mathematically, this is a simple application of a form of Stoke's theorem,
where all but 2 of the terms cancel since the flux is constant in the interior).
Diagram 1: The amount of fog along a pixel as the difference between the point a
ray enters the volume and exits.
A Simple Case
The first case to consider is a way of rendering a convex volumetric fog that has no object in it,
including the camera. The algorithm can easily be expanded to handle objects (or parts of the
scene) inside the fog, the camera inside the fog, and concave volumes.
Computing this term on a per pixel basis involves several passes. Clearly, for any view there
are two distances of concern: the point the ray enters the fog, and the point the ray exists.
Finding the point a ray enters a fog volume is computed by rendering the fog volume and
reading the w value. Finding the ray on the other side of the fog volume is also not difficult.
Polygons not facing the camera are culled away - but since any surface not facing the camera
would be the backside of the fog volume, reversing the culling order and drawing the fog again,
renders the inside of the fog volume. With convex volumes - there will never be case where the
ray will pass in and out of a fog volume twice.
To get the total amount of fog in the scene, the buffer containing the front side w values of the
fog volume is subtracted from the buffer containing the back side w values of the fog. But, the
first question is, how can w pixels operations be performed? And then, how can this value be
used for anything? Using a vertex shader, the w is encoded into the alpha channel, thereby
loading the w depth of every pixel into the alpha channel of the render target. After the
subtraction, the remaining w value represents the amount of fog at that pixel.
Front side, back side, and the difference (with contrast and brightness increased)
z Render the backside of the fog volume into an off-screen buffer, encoding each pixels w
depth as its alpha value.
z Render the front side of the fog volume with a similar encoding, subtracting this new
alpha from the alpha currently in the off-screen buffer.
z Use the alpha values in this buffer to blend on a fog mask.
The reason why this is not correct is obvious; the actual volume of fog between it and the
camera has been computed incorrectly. Because there is an object inside the fog - the backside
of the fog is no longer the polygonal hull that was modeled, but the front side of the object. The
distances of fog needs to be computed using the front side of the object as the back end of the
fog.
This is accomplished by rendering the scene (defined as objects in the fog) using the W trick. If
a pixel of an object lies in front of the fog's back end - it replaces the fogs backend with its own,
thereby becoming the virtual back part of the fog.
Unfortunately, the above approach has one drawback. If an object is partially obscured by fog,
then the component that is not in the fog will be rendered into the back buffer, effectively
becoming the backside of the fog. Thus, then the distance from these pixels to the camera
would be counted as the distance of fog - even though there is none.
Although this could be corrected by using the stencil buffer, another approach is to redraw (or
frame copy) the screen in the front side of pass - thereby using the scene as the fog front as
well as the back. This causes objects partially obscured by fog to render correctly - those parts
not in fog result in a 0 fog depth value. This new approach looks like:
There is now one more neat trick to perform - allowing the camera to enter the fog. Actually,
the fog clipping plane and the geometry clipping plane are aligned - then the trivial case will
already work At some point - parts of the fog volume will be culled against the near clipping
plane. Since the front plane is by default cleared with 0s (indicating that those pixels are 0
depth from the camera) than when the clipping of the front volume begins to occur - the pixel's
being rendered on those polygons would have been 0 anyway.
There is one more problem that crops up. To accommodate an object moving through the fog -
two steps were added, one of which acted as the front side of the fog. But if the camera is
inside the fog volume, then a key assumption has been broken. Not all of the fog volume is
actually rendered since part of the fog volume is clipped away. This means that Step 4 in the
above algorithm now becomes a major problem - as it becomes the effective front side of the
fog. The polygons of the fog volume can no longer replace those pixels set by the scene since
the fog volume polygons have been (at least partially) culled away.
The solution to this is simple. Step 4 was added specifically to allow objects that were only
partially obscured by fog to render correctly, since any pixel rendered in step 4 would be
replaced by step 5 if it were in the fog. Obviously, if the camera is inside the fog - then all parts
of an object are partially obscured by fog. Thus, step 4 should be disabled completely. The
following is a complete and general implementation of the rendering of uniform density, convex
fog hulls.
Cleary, this is a simple foundation for fog - there are numerous improvements and
enhancements that can be made. Perhaps highest on the list is a precision issue. Most hardware
allows only 8 bit alpha formats. Because so much is dependent on the w depth - 8 bits can be a
real constraint. Imagine a typical application of a volumetric fog - a large sheet of fog along the
ground. No matter what function used to take the depth and render it into fog - there remains a
virtual far and near clipping plane for the fog. Expanding these planes means either less dense,
or less precise fog, while keeping them contracted means adjusting the fog clipping planes for
each fog volume rendered.
On new and upcoming hardware, however, there is a trick with the pixel shaders. Why not keep
some more bits of precision in one of the color channels, and use the pixel shader to perform a
carry operation? At first glance it appears that 16 bit math easily be accomplished on parts
designed to operate at only 8. However, there is one nasty limiting factor - on a triangle basis -
the color interpolators work at only 8 bits. Texture coordinates, on the other hand, typically
operate at much higher precision, usually at least 16 bits. Although texture coordinates can be
loaded into color registers, the lower bits of precision are lost . An alternative is to create a 1D
step function filled texture, with each texel representing a higher precision value embedded in
the alpha and color channels. Unfortunately, the precision here is usually limited to the size of a
texture.
Once the issue of higher precision is addressed, it is possible to render concave volumes even
with limited 8-bit hardware. This must be accomplished by either rendering concave fog
volumes as a collection on its convex parts, or by summing the multiple entry points of fog and
subtracting away the multiple exit points. Unfortunatly, the high precision trick will not work for
the latter approach since there is no way to both read and write the render target in the pixel
shader. Although a system of swapping between multiple buffers carefully segmented to avoid
overlap might work, this latter approach will probally not be feasible until hardware allows
rendering into 16 bit formats (i.e. a 16 bit alpha format).
Finally, there are many artistic enhancements that can be made on this kind of volumetric
effect. To make volumetric light, for instance, the alpha blends modes can be changed to
additive rather then blend, thereby adding light to the scene. Decay constants can also be
modeled in this way, to accomplish some surface variations of fog density.
Additionally, fog volumes can be fitted with textures on top of them that operate much like
bump maps do - varying the height of the fog at that point without changing the actual
geometry. To create an animated set of ripples in fog, for instance, one can take a ripple depth
texture and move it along the surface of the fog volumes, and adding it to the w depth. Other
texture tricks are possible as well - noise environment maps can be coupled to fog volumes to
allow primitive dust effects.
And of course, it can be quite fun to draw the fog mask without actually drawing the object -
creating an invisible object moving through the scene.
Supplement
The article, Volumetric Rendering in Real-time, covered the basis of volumetric depth rendering,
but at the time of the writing, no pixel shader compliant hardware was available. This
supplement describes a process designed to achieve two goals, to get more precision out of an
8 bit part, and to allow the creation of concave fog volumes.
Handling Concavity
Computing the distance of fog for the convex case was relatively simple. Recall that the front
side of the fog volume was subtracted away from the backside (where the depth is measured in
number of units from the camera). Unfortunately, this does not work with concave fog volumes
because at any given pixel, it may have two back sides and two front sides.
The solution is intuitive and has sound mathematical backing - sum all of the front sides and
subtract them from the summed front sides. As shown in diagram one - this is the
mathematical equivalent of breaking the volume into convex chunks and summing the up.
(B1-A1) + (B2-A2) factors to (B2+B1)-(A2+A1)
Computing concavity is as simple as adding the multiple front sides and subtracting them from
the multiple back sides. Clearly, a meager 8 bits won't be enough for this. Every bit added
would allow another summation and subtraction, and allow for more complex fog scenes.
There is an important assumption being made about the fog volume. Is must be a continuous,
orientable hull. That is, it cannot have any holes in it. Every ray cast through the volume must
enter through hull the same number of times it exits.
The alpha blender is typically used to blend on alpha textures by configuring the source
destination to multiply against the source alpha, and the destination to multiply against the
inverse alpha. However, they can also be used to add (or subtract) the source and destination
color channels. Unfortunately, there is no way to perform a carry operation here: If one channel
would exceed 255 for a color value, it simply saturates to 255.
In order to perform higher bit precision additions on the Alpha Blending Unit, the incoming data
has to be formatted in a way which is compatible with the way the alpha blender adds. To do
this, the color channels can hold different bits of the actual result, and most importantly, be
allowed some overlap in their bits.
The above will give us 12 bit precision in an 8 bit pipe. The Red channel will contain the upper 8
bits, and the blue channel will contain the lower 4 -plus 3 carry spots. The upper bit should not
be used for reasons which are discussed later. So the actual value encoded is Red*16+Blue.
Now, the Alpha Blender will add multiple values in this format correctly up to 8 times before
there is any possibility of a carry bit not propagating. This limits the fog hulls to ones which do
not have concavity where looking on any direction a ray might pass in and out of the volume
more than 8 times.
Encoding the bits in which will be added cannot be done with a pixel shader. There are two
primary limitations. First, the color interpolators are 8 bit as well. Since the depth is computed
on a per vertex level, this won't let higher bit values into the independent color channels. Even
if the color channel had a higher precision, the pixel shader has no instruction to capture the
lower bits of a higher bit value.
The alternative is to use a texture to hold the encoded depths. The advantage of this is twofold.
First, texture interpolaters have much higher precision than color interpolaters, and second, no
pixel shader is needed for initial step of summing the font and back sides of the fog volume.
Unfortunately, most hardware limits the dimensions of textures. 4096 is a typical limitation.
This amounts to 12 bits of precision to be encoded in the texture. 12 bits, however, is vastly
superior to 8 bits and can make all the difference to making fog volumes practical.
Setting it all Up
Three important details remain: The actual summing of the fog sides, compensating for objects
inside the fog, and the final subtraction.
The summing is done in three steps. First, the scene needs to be rendered to set a Z buffer.
This will prevent fog pixels from being drawn which are behind some totally occluding objects.
In a real application, this z could be shared from the pass which draws the geometry. The Z is
then write disabled - so that fog writes will not update the z buffer.
After this, the summing is exactly as expected. The app simply draws all the forward facing
polygons in one buffer, adding up their results, and then draws all the backward facing
polygons in another buffer. There is one potential problem, however. In order to sum the
depths of the fog volume, the alpha blend constants need to be set to one for the destination
and one for the source, thereby adding the incoming pixel with the one already in the buffer.
Unfortunately, this does not take into account objects inside the fog that are acting as a
surrogate fog cover. In this case - the scene itself must be added to scene since the far end of
the fog would have been rejected by the Z test.
At first, this looks like an easy solution. In the previous article, the buffers were setup so that
they were initialized to the scene's depth value. This way, fog depth values would replace any
depth value in the scene if they were in front of it (i.e. the Z test succeeds) - but if no fog was
present the scene would act as the fog cover.
This cannot be done for general concavity, however. While technically correct in the convex
case, in the concave case there may be pixels at which the fog volumes are rendered multiple
times on the front side and multiple sides on the backside. For these pixels, if the there was
part of an object in between fog layers than the front buffer would be the sum of n front sides,
and the back side would be sum of n-1 back sides. But since the fog cover was replaced by the
fog - there are now more entry points then exit points. The result is painfully obvious - parts of
the scene suddenly loose all fog when they should have some.
The above diagram illustrates that without taking into account the object's own depth value, the
depth value generated would be B1 - A1 - A2 since B2 was never drawn because it failed the Z
test of the scene. This value would be negative, and no fog would get blended. In this case, C
needs to be added into the equation.
The solution requires knowing which scenario's where the scene's w depth should be added and
which scenarios the scene's w depth should be ignored. Fortunately, this is not difficult to find.
The only situation where the scene's w depth should be added to the total fog depth are those
pixels where the object is in between the front side of a fog volume and its corresponding
backside.
The above question can be thought of asking the question: did the ray ever leave the fog
volume? Since the fog hulls are required to be continuous, then if the answer is no then part of
the scene must have blocked the ray. This test can be performed by a standard inside outside
test.
To perform an inside/outside test - each time a fog pixel is rendered, the alpha value is
incremented. If the alpha values of the far fog distances is subtracted to the corresponding
point on the near fog distance, then values greater then 1 indicate the ray stopped inside the
volume. Values of 0 indicate that the ray left the fog volume.
To set this test up, the alpha buffer of the near and far w depth buffers must be cleared to 0.
Each time a fog pixel is rendered, the alpha will be incremented by the hex value 0x10. This
value was used because the pixel shader must perform a 1 or 0 logical operation. A small
positive value must be mapped to 1.0 in the pixel shader, a step which requires multiple shifts.
Due to instruction count restraints - the intial value has to be at least 0x10 for the shifts to
saturate a non-zero value to one.
The rest is straightforward - all the front sides and all the backsides are summed up in their
independent buffers. The scene is also drawn in its own buffer. Then all three buffers are ran
through the final pass where the scene's w depth is added on only if the differences of the alpha
values is not 0.
This requires a lengthy pixel shader. A great deal of care must be taken to avoid potential
precision pitfalls. The following pixel shader performs the required math, although it requires
every instruction slot of the pixel shader and nearly every register. Unfortunately, with no carry
bit, there is no way to achieve a full 8 bit value at the end of the computation, so it must settle
for 8.
ps.1.1
def c1, 1.0f,0.0f,0.0f,0.0f
def c4, 0.0f,0.0f,1.0f,0.0f
// input:
// b = low bits (a) (4 bits)
// r = high bits (b) (8 bits)
// intermediate output:
// r1.b = (a1 - a2) (can't be greater than 7 bits set )
// r1.r = (b1 - b2)
sub r1.rgb,t1,t0
+sub_4x r1.a,t0,t1 //If this value is non zero, then
mov_4x r0.a,r1.a //the were not as many backs as
mad r1.rgb,r0.a,t2,r1 //front and must add in the scene
dp3 t0.rgba,r1,c4 // move red component into alpha
This pixel shader gives an alpha value which represents the density of fog, and loads the fog
color constant into the color channels. The Alpha Blending stage can now be used to blend on
the fog.
Finally, there is one situation which can cause serious problems - clipping. If a part of the fog
volume is clipped away by the camera because the camera is partially in the fog, then part of
the scene might be in the fog. Previously, it was assumed the camera was either entirely all the
way in, or all the way out of the fog. This may not always be the case.
An alternative solution is to not allow polygons to get clipped. The vertex shader can detect
vertices which would get clipped away and snap them to the near clip plane. The following
vertex shader clips w depths to the near clip plane, and z depths to zero.
By Lasse Jensen
Gamasutra
September 26, 2001
URL: http://www.gamasutra.com/gdce/jensen/jensen_01.htm
In this paper we introduces a new realtime level-of-detail deep-water animation scheme, which
uses many different proven water models. In addition we show how to utilities today’s latest
graphic hardware for realistic rendering of oceans. Keywords: FFT, Surface Dynamics, Navier-
Stokes, Caustics, Godrays, Water optics, Foam and Spray.
Introduction
This paper introduces a fairly complete animation and rendering model for deep-water. In short
the animation model is based on mixing the state-of-the- art water models from the computer
graphics literature to suite our need for it to remain realtime. This includes:
z Oceangraphic statistics based surface wave model for ocean waves (2.1 FFT)
z Physical correct surface wave model, taking depth into account, for realistic shorelines
etc. (2.3 Shallow water waves)
z Constraint physical correct wave model for object interaction (2.4 Surface waves)
z Full Navier-Stokes simulated bump-map for surface tensions and similar turbulent effects
(2.2 Navier-Stokes Equations) Our realistic (realtime) rendering of water includes all of
the following visual features:
z View dependent water colouring (3.3 Colour of water)
z Global reflection/refraction (3.1 Reflection/Refraction)
z Local reflection/refraction (3.1.1 Reflection, 3.1.2 Refraction)
z Caustics (3.5 Caustics) and Godrays (3.6 Godrays)
z Foam and spray (3.7 Foam, Spray and Bubbles)
Animation
The main philosophy behind our animation is that there is “no single model fitting all needs”.
We haven’t tried to make one super model, but instead investigated how to blend between
different types and levels of animation. We will first present all the difference models used, and
then summaries how and what we used them for.
FFT
In this chapter, we will describe the algorithm we’re using as a core of our sea animation. The
algorithm is explained in detail in [2]. This model isn’t based on any physics models, but
instead uses statistical models based on observations of the real sea. The method has been
used commercially several times, for example for sea animation in the movies Titanic and
Waterworld.
In this statistical model of sea, wave height is a random variable of horizontal position and
time, h(X,t). It decomposes the wave heightfield into a set of sinus waves with different
amplitudes and phases. While the model itself provides us with a tool to generate these
amplitudes and phases, we use inverse Fast Fourier Transformation (FFT) as a mean to quickly
evaluate the sum.
FFT is a fast version of discrete Fourier transformation, i.e. Fourier transformation that samples
the input at regularly placed points. Description of both regular FT and FFT together with it’s
interesting properties and working algorithms can be found in [6], which is also available online.
Here X is a horizontal position of a point whose height we are evaluating. The wave vector K is
a vector pointing in the direction of travel of the given wave, with a magnitude k dependent on
the length of the wave (λ):
k =2π /λ
And the value of h (K,t) is a complex number representing both amplitude and phase of wave K
at time t. Because we are using discrete Fourier transformation, there are only a finite number
of waves and positions that enters our equations. If s is dimension of the heightfield we
animate, and r is the resolution of the grid, then we can write:
Now that we know how to convert field of complex numbers representing wave amplitudes and
phases into a heightfield, we need a way to create the amplitudes and phases themselves.
Tessendorf [2] suggests using the Phillips spectrum for wind-driven waves. It is defined by the
following equation:
In this equation, l = v2 /g the largest possible wave arising from a continuous wind with speed
v, g is the gravitational constant, Wˆ is direction of the wind and Kˆ is direction of the wave
(i.e. normalized K). a is a numeric constant globally affecting heights of the waves. The last
term in the equation (|Kˆ Wˆ|2) eliminates waves moving perpendicular to the wind direction.
In this form, the resulting animation contains waves that adhere the wind direction, but move
both with and against it, resulting in a lot of meeting waves (and opportunities for splashes and
foam creation). If you prefer waves moving in one direction, you can modify this term to
eliminate waves that moves opposite to the wind (i.e. the dot product is negative).
Also, to improve convergence properties of the spectrum, we can try to eliminate waves with
very small length (w<<l) by multiplying the equation by the following term:
There are now two steps we have to do in order to prepare data for FFT – create the amplitudes
and phases at time zero, and then animate the field. The first part can be accomplished using
the equation:
Where ζ r
and ζi are two independent draws from a Gaussian random number generator with
mean 0 and standard deviation 1.
Now, given time t, we can create a field of the frequency amplitudes (independently on previous
time, which can be valuable):
Where ω is angular frequency of wave k representing the speed at which the wave travels
across the surface. You may wonder, what is the source of the right term in this equation? It’s
there, because our resulting heights (result of the inverse FFT) are only real numbers (i.e. their
imaginary part is equal to zero). It can be shown that for such a function, the following must
holds for the amplitudes:
As you may notice, there is one last piece missing in the jigsaw, and that’s value of ω for a
given wave. Since we are animating deep-water sea, there is a simple relation between ω and
the corresponding wave-vector K:
ω2 (K) = gk
Here g is again the gravitational constant and k is the magnitude of vector K.
There are several modifications to this equation, perhaps the most useful for our purpose is
taking depth d into account:
Also, if you intend to precalculate the animation, you might try to express each frequency as a
multiply of the same basic angular frequency ω0 to ensure that the animation loops after a
certain time. The results of implementing this set of equations, given above, are a tile of highly
realistic sea surface. Given the properties of the FFT it can be seamlessly tiled over and over
again. This is a very useful property, even though the tiling can be visible as a repeating
pattern. We can improve this by choosing a larger grid, but this obviously comes at a
computational expense. Tessendorf [3] mentions that for the Titanic animation, a grid size of
2048 was used. This is unfortunately too big to be animated in realtime on consumer-class
computers. In our experiments we have been using mostly grid size 64, which inverse FFT can
be computed quite fast. The size 128 however gives a (subjectively) much better visual result
and will probably be the right size in case one are targeting today’s high-end configurations
(and the water animation comprise significant part of the whole view).
Choppy Waves
The described algorithm produces nice looking waves, but they all have rounded tops, which
suggests nice weather conditions. There is however one modification for making the wave tops
sharper and wave bottoms more flat.
Instead of modifying the heightfield directly, we will horizontally displace the positions of the
grid points using the equation:
X = X + λD(X,t)
where λ is a constant controlling the amount of displacement, and D is the displacement vector
computed with FFT:
The value of λ must be carefully chosen – if it’s too big the waves start to intersect them selves,
and that certainly breaks the realism. However, detecting this situation seems to be a good way
of spawning foam and spray – more on this in chapter 3.7. The difference between the normal
waves and the choppy waves modification can be seen in Figure 2-1 and Figure 2-2
respectively.
Navier-Stokes Equations
In the field of Computational Fluid Dynamics (CFD) the Navier-Stokes Equations (NSE) are
know to fully describe the motion of incompressible viscose fluid. In NSE there are three types
of forces acting
z Body forces (Fg). These are forces that act on the entire water element. We assume this
is gravity only, so Fg= ρG, ρ is density and G is the gravitational force (9.81m/s 2 )
z Pressure forces (Fp). These forces act inwards and normal to the water surface.
z Viscous forces (Fv). These are forces due to friction in the water and acts in all directions
on all elements of the water.
The pressure forces are defined as the negative of the gradient of the pressure field of the
water elements, i.e.:
Given the fact that water is a Newtonian fluid [19], i.e. a fluid where the stress is linearly
proportional to the strain, the net viscous force (Fv) per unit volume is defined as:
Now that we have all the forces acting in fluids, we will use Newton’s second law (F = mA) to
describe the motion:
Now assuming uniform density we can write the equation as:
This equation conserves the momentum. In addition we need the mass to be conserved:
These two equations together are referred to as the NSE. Unfortunately the NSE is a set of
highly non- linear Partial Differential Equations (PDEs) that’s not easily solved. In the literature
there’s many methods for discretizating the PDE both in time (explicit/implicit) and space
(Finite Difference, Finite Volume, Finite Element). Going into detail on how to solve the NSE
would require a document it self, so instead we will just briefly described what we implemented
and how we used the result.
We started by implementing an explicit finite difference scheme on a uniform grid known as the
Marker-And-Cell (MAC) method, since this is widely used in earlier works ([7], [8], [9], and
[10]). In short, we divide the solution space into finite cells that holds the velocity and
pressure. We then solves Equation 2-9 by finite differences on this grid, for then to enforce
Equation 2-10 by an iterative process called Successive Over Relaxation (or one can form a
linear system and solve it with for example a Preconditioned Conjugate Gradient method).
While solving this is rather simple in closed form 2 , adding boundary conditions and then in
particular free-surface conditions is complicated and not well described in the given references.
Another problem inherent to finite differences, are stability. Although what’s know as the
Courant-Friedrichs- Levy (CFL) conditions for stability can be somewhat 3 enforced by
calculating local viscosity and adjusting the time step according to the velocity and the cell size,
it gave us unbelievable much pain! We therefore took the time to also implement Jos Stam’s
stable solver [11]. Once again it turns out that we can use FFT for solving the closed form, that
we will use for the surface details. Stam has recently also released source code for this solver
[18], so one should be able to get up running with this effect quite fast! Once we have a field
solved with the NSE, we populate it with particles that are moved according to the bilinear
interpolated velocity of the nearest grid elements. These particles will quickly form streamlines
(see Figure 2-3) in the field, showing all the turbulent vorticity we expect to see on a tension
surface. We then take the finite differences of these particles velocities, and treat them as
tangents, for normal calculation. All these normals are then feed into a bump-map, that we
apply as real-time surface detail as shown in Figure 2-4.
Figure 2-3. Our 2D NSE solver showing how the
particles forms streamlines in the closed container.
The velocity and pressure field is also shown on the
right side of the view.
Were G is gravity (and other global forces), h is the height of the water surface, d is the depth,
and u is the horizontal velocity of a vertical column of water. We can also combine these two
equations. Start by differentiate Equation 2-11 with respect to x and Equation 2-12 with respect
to t:
Now substituting the partial cross-derivative of Equation 2-14 into Equation 2-13 we end up
with:
Now that we have turned the partial-differential equation into a second order ODE we will solve
it using a first-order implicit method. First we will use finite-differences to discretisise the first-
and second order time-derivatives of h:
We are solving for hi so we will rearrange Equation 2-17 and substitute it into Equation 2-18:
The last discretisation done is to treat the depth as constant during iteration, so one ends up
with the following linear system:
And it’s solved exactly as the 2D case simply by splitting it up into two systems - one
dependent on x and one on y.
Surface Waves
The last level of animation detail we use is strictly 2D surface waves. If we take our height-field
from earlier and constrains the water to a fixed depth, Equation 2-22 reduces to:
Where |V| is the velocity of the wave (across the surface). Let hx,yt be the height of the grid at
position x and y at time t, then Equation 2-23 can be discretisised using central differences as
[3]:
As shown in great detail in [3] this can be animated with just a few arithmetic operation per
grid-element
For mixing the FFT and physics water, we simply take the geometry from the FFT algorithm and
superimpose on it the geometry resulting from the physics model (that’s computed only around
floating objects). Although not physically correct, this provides us with the results we desire.
One method for approximating buoyancy is described for example in [3]. As known, according
to Archimedes, the force of buoyancy is equivalent to the weight of water displaced by the
floating objects. To approximate the displaced volume, we represent the object by a series of
patches described by the coordinates of their centre, their area (a) and their normal (N). Then
for a given patch (if it’s centre lies in the water), the volume of displaced water can be written
as:
V = a(Pwater - Ppatch_center)N
where Pwater is the point on the water surface and Ppatch_center is the position of the centre of the
patch. Now, for simplification, we can assume that this force has always direction of the water
surface’s normal at the given sampling point. Thus the force applied to the centre of our patch
is:
F= ρvNwater
where ρ is the water density. We apply this force to the given point using the standard equation
for rigid object physics, as described for example in [13]. There are also two other forces that
we should try to simulate. First, floating objects don’t slide freely on the water because of drag.
We approximate it for each patch using equation:
Fdrag= ßaV
where ß is constant and V is velocity of the patch centre relative to the water. Also, when object
with proper shape moves on the water, it rises out from the water, depending of it’s shape,
orientation and velocity (this effect is best seen on boats).
F = -ϕa(N.V)N
where ϕ is another constant. Note that we use bilinear interpolation of values defined in the
grid, to obtain all quantities connected to the water surface, at arbitrary points on the surface.
While this covers the way water affects floating objects, there should also be feedback going in
the opposite way. The proper solution would be to take the object into account directly in the
physical equations used for animating the water surface, but since values entering these
equations don’t represent the complete mixed water anyway, we decided to use another
approach. First, for all grid elements touched by any object, we temporally increase the
damping factor in the wave model used for object interaction (this creates a wave “shadow” –
i.e. place in which waves don’t spread). Secondly, we compute the change in depth of the
floating object between the last and current frame, and feed this difference back to the water
surface as direct displacement of the surface. With correct scale of this effect, we get both
waves from objects that falls into the water and waves formed behind moving objects (such as
boats).
Rendering
Reflection/Refraction
Most of the visual effects of water are due to reflections and refractions (more detailed
description can be found for example in [2] and [16]). When a ray hits the water surface, part
of it reflects back to the atmosphere (potentially hitting some object and causing reflective
caustics, or hitting the water at other place, or camera), and part of it transmits inside the
water volume, scattering (which causes god rays), hitting objects inside the water (causing
caustics) or going back into the atmosphere. Thus completely correct lighting would require
sophisticated global shading equations and wouldn’t even be close to realtime. We simplify this
by only taking first-order rays into account.
Reflection The equation for reflection is well known. For an eye vector E (i.e. the ray from the
given point to the eye) and the surface normal N, the reflected ray is:
R=2(E*N)N-E
This ray is then used for lookup in cube-map containing the environment (for ocean typically
only the sky).
While the cube-map is ideal for reflecting environment in distance, it’s not very suitable for local
reflections (for example boat floating on the water). For this we use a modification of the basic
algorithm used for reflections on flat surfaces (described for example in [14]). We set up the
view matrix so that it shows the scene, as it would be reflected from a flat plane placed at
height zero, and render the whole scene into a texture. Now if we simple used projective
textures, we could render the water surface roughly reflecting the scene above it. To improve
the effect, we assume that our whole scene is placed on a plane positioned slightly above the
water surface. We intersect the reflected ray with this plane and then compute the intersection
of ray between this point and the reflected camera. The resulting point is then fed into the
projective texture computations.
Note that when rendering to the texture, we set the camera’s FOV (field of view) slightly higher
than one do for the normal camera, because the water surface can reflect more of the scene
than a flat plane would.
Refraction We will use Snell’s Law to calculate the refracted ray that we need both for the
refracted texture lookup and for the caustics calculations. Snell’s Law is simply:
Where T i is the angle of incidence (i.e. angle between the view vector and the surface normal),
T r is the refracted angle (i.e. between the reflected ray and negate of normal) and na and nb is
the index of refractions for the two materials in question. Setting the index of refraction for air
and water equal to 1 and 1.333 respectively we can write Equation 3-1 as:
While this works perfectly in 2D, use of this equation directly in 3D would be too cumbersome.
When using vectors, it can be shown that the refracted ray is described by:
Here + sign is used when 0 <·N E . For derivation of this formula, see [15]. With this vector, we
are now ready to render the refraction visible on water surface. For the global underwater
environment we again use a cube map. For local refractions we use an algorithm very similar to
that used for reflections, with only two differences – the scene is rendered into the texture
normally, and the plane we’re using for perturbing the texture coordinates is placed below the
water surface.
Here a is the angle between incoming light and the surface normal and na and nb is the
coefficients from Snell’s law (Equation 3-2). Since we use an index of 1.333 g only depends on
k, so it’s possible to precalculate this and store it in a one-dimensional texture [4]. Another
possibility is to approximate Equation 3-3 with a simpler function so we can calculate it directly
with the CPU or on the GPU using vertex-/pixel-shaders. In the implementation of [5] they
approximate this simply by a linear function that we didn’t find adequate. Instead by
experimentations we found out that reciprocal of different powers gives a very good
approximation.
In Figure 3-1 we can see the error-plot of a few different powers, and in Figure 3-2 we see our
chosen power compared against Equation 3-3.
Color of Water
In chapter 3.1.2 we have described how to render refractions on the water surface. It should
however be noted that for deep water, only local refractions should be rendered since one
cannot see the sea bottom or any other deeply placed objects (and even the local refractions
should be rendered with some kind of fogging). The water itself however has colour that
depends on the incident ray direction, the viewing direction and the properties of the water
matter itself. To remedy for this effect we take the equations presented in [16], that describes
light scattering and absorption in water, and modify them as described shortly. If we don’t take
any waves into account (i.e. we treat the water surface as a flat plane) and ignore effects like
Godrays, we obtain closed formulas for the watercolour depending only on the viewing angle.
This colour is then precalculated for all directions and stored in a cube-map, which is used in
exactly the same way as the cube-map for the refracted environment was.
Thanks to that we get darker blue water when looking into depth and brighter greenish colour
when looking at the waves, as shown in Figure 3-3.
Unfortunately although the FFT water surface tiles, the resulting caustics pattern does not,
because we use only one tile of the surface in the computations. Since calculating the caustics
takes considerable time we can’t afford to calculate it for the entire ocean, so we need a way to
make it “tileable”. We solve this by blitting parts of the resulting caustic texture nine times, one
for each directions, from a large caustic texture. Each part is added to the middle “cut out”
which we use as the final caustics texture. This process is illustrated in Figure 3-7 with the
result shown in Figure 3-8. A nice side effect of this process is that we can use the multi-
texturing capabilities of today’s hardware to do Anti-Aliasing at the same time. We simply set
up four passes of the same texture and perturblate the coordinates of each pass slightly to
simulate the effect of a 2x2 super-sampling. This is in our opinion needed, since the caustics
patterns has a lot of details that quickly aliases if the specular surface isn’t dense enough to
represent the pattern properly. On the other hand we could of course use the other passes to
reduce the number of blits.
Since the caustics patterns changes rapidly with depth, as seen in Figure 3-9, we use the
camera’s bounding box and previous depth to decide an average depth to use.
For applying this texture to objects underwater, we need a way to calculate the texture-
coordinates into the caustics texture. Given the sun’s ray direction and the position of a
triangle, we compute it’s texture’s UV coordinates by projecting the texture from the height of
the water in the direction of the ray (note that because this works as a parallel projection, we
don’t even have to use projective textures here). In addition we compute the dot product
between the surface’s normal and the inverted ray direction to obtain the intensity of the
applied texture (we use this as alpha then). The same algorithm can be used to create
reflective caustics on objects above water.
Godrays
Earlier, we described how the light causes caustics by the water surface focusing and
defocusing light rays. However as the rays pass the water matter, they scatter from small
particles floating in the water (plankton, dirt), making them visible and causing streaks of light
known as Godrays. Rendering this phenomenon correctly would require volumetric rendering.
However if we don’t insist on absolute correctness, preferring the visual look of the result, we
can use a quite simple algorithm to create relatively convincing pictures. We already have the
caustics texture, which represents shape and positions of the individual ray streaks (even
though only as a slice at given depth). If we define this slice to represent the light intensity for
the whole volume, we can render it using techniques for volumetric rendering.
Given position of our camera, we create several (in our experiments 32) slices of the volume as
seen in Figure 3-10. We then render them into the completed scene with additive alpha-
blending (and writes to zbuffer disabled).
Because this method shows visible artefacts – revealing the low sampling, we use a non-
uniform distribution of the samples. We use high density in front of camera – these samples are
responsible for the smooth look of the result and for bright spots where they should be. The
lower density samples further away from the camera ensure that the rays extend into distance.
Since increasing the number of rendering passes considerably slows down the whole process,
we can use the multitexturing capabilities of graphics hardware to increase the number of
samples as suggested in [12]. So even if we render just one slice, we apply to it four textures
at once as if they represented subsequent samples of the volume. In this way we obtain 128
samples on the GeForce3, which gives us smooth enough pictures in most cases (as seen in
Figure 3-11).
Note that we can “skew” the volume, resulting from repeating our caustics texture, in any way
to simulate rays going from a given direction (according to position of the sun). An additional
improvement (which we didn’t implement) would be to use shadow buffer to take shadows cast
by objects in water into account.
It is important to note that even though the alpha factor of the foam texture is limited to the
range [0,1], this is not true for the foam amount (that can be more then one, but should be still
limited). Also, when we detect a foam-producing point, we shouldn’t set its foam amount
immediately to maximum – the vertex is likely to spawn foam the next few frames as well, and
increasing the foam amount slowly gives a better visual result. Limitations of this technique are
quite obvious – the rendered foam looks quite similar at different places (since it’s just a
repeated texture, not an uniquely generated pattern), and it doesn’t move on the water surface
according to it’s slope (though one might get the impression that this is happening when using
the choppy waves algorithm).
Particle System
When water collides against obstacles we generate spray of water using a particle system with
simple Newtonian dynamics, see [20]. Each particle is given an initial velocity taken directly
from the water-surface’s velocity, at the spawning position, with added turbulence. It’s then
updated according to gravity, wind and other global forces thereafter. Rendering of the particles
are done with a mixture of alpha-transparency and additive-alpha sprites. See Figure 3-13 for a
screen shot of this effect. The particle system is also used for drawing bobbles from objects
dropped into the water. For this effect we simply move the bobbles on a sinus path around the
buoyancy vector up to the surface were they are killed.
Implementation Details
We implemented the algorithms described in this paper on a PC platform with windows. Both
the FFT-based and physical-based animations were realized for grids with 64x64 elements. Two
FFTs were required for the animation, one complete complex complex for the surface slope
(that is later used either for the choppy waves or for surface normals) and one complex real for
surface height. Our first implementation used routines from [6], but later we replaced them by
faster routines from the Intel . Math Kernel Library. Rendering is implemented in DirectX 8.1
using nVidia’s GeForce3 hardware for rendering. While the basic computations (heights,
normals, foam etc.) is done only once for a single water tile (that can be repeated all over the
place), many other computations depends on the viewer position (we use local viewer
everywhere) and thus had to be done separately for each tile. This offers perfect opportunity for
the use of vertex shaders, offloading the burden of those computations from CPU. Some of
these effects (especially per-pixel bump-mapping) requires the use of pixel shaders as well, but
in general most of the algorithms described here should be possible on DirectX7 class hardware.
We have presented a new scheme for deep-water animation and rendering. It’s main
contributions on the animation side is the blending of proven methods for realistic object/ocean
interaction. On the rendering side we have presented a new method for foam rendering and
shown clever use of the new 3D graphic cards features to reach new levels of (realtime)
realism. There are many extensions, to the current implementations, that we want to try out in
the future. First of all we are not to impressed by our Phong shaded water shimmering. We
believe this is mainly because of too low contrast in the final image. Contrast enhancement can
probably be realised by using Hi-Dynamic Range Images (HDRI), as described in [22]. We also
want to try prefiltering of the environment-map [21] to approach the BRDF shading of water.
When it comes to animation, there’s so much cool stuff out there to follow up! Foremost we are
trying to get the ocean sinus model from [23] to work with our system…breaking waves next?
Acknowledgements
We would like to thank Richard Lee for implementing the choppy waves modification and
“forcing” us to add light shimmering. Super-hero-star to Tore Blystad and Christian Morgan
Enger for their excellent demo artwork, and a final thank to Mads Staff Jensen for the slide
illustrations!
References
[1] Alan Watt and Mark Watt. “Advanced animation
and rendering techniques”. ISBN 0-201-54412-1
[2] Jerry Tessendorf. “Simulating Ocean Water”.
SIGGRAPH 2001 Course notes.
http://home1.gte.net/tssndrf/index.html.
[3] Miguel Gomez. “Interactive Simulation of Water
Surfaces”. Game Programming Gems. ISBN 1-58450-
049-2.
[4] Anis Ahmad. “Improving Environment-Mapped
Reflection Using Glossy Prefiltering and the Fresnel
term”. Game Programming Gems. ISBN 1-58450-
049-2.
[5] Alex Vlachos and Jason L.Mitchell. “Refraction
Mapping for Liquids in Containers.” Game
Programming Gems. ISBN 1-58450-049-2.
[6] Press, Teukolsky, Vetterling, Flannery. “Numerical
Recipes in C, The Art of Scientific Computing”.
Second edition. Cambridge University Press. ISBN
0-521-43108-5.
[7] Jim X. Chen, Niels da Vitoria Lobo, Charles E.
Hughes and J.Michael Moshell. “Real-Time Fluid
Simulation in a Dynamic Virtual Environment”.
IEEE Computer Graphics and Application. May-June
1997, pp.52-61.
[8] Nick Foster and Dimitri Metaxas. “Realistic
Animation of Liquids”. Graphical Models and Image
Processing, 58(5), 1996, pp.471-483.
[9] Nick Foster and Dimitri Metaxas. “Controlling Fluid
Animation”. Proceeding of the Computer Graphics
International (CGI’97).
[10] Nick Foster and Dimitri Metaxas. “Modeling the
Motion of a Hot, Turbulent Gas”. Computer
Graphics Proceeding, Annual Conference Series,
1997, pp. 181-188.
[11] Jos Stam. “Stable Fluids”. SIGGRAPH’99
Proceedings.
[12] C.Rezk-Salama, K.Engel, M.Bauer, G.Greiner,
T.Ertl. “Interactive Volume Rendering on Standard
PC Graphics Hardware Using Multi-Textures And
Multi-Stage Rasterization”
[13] David Baraff, Andrew Witkin. “Physically Based
Modeling” SIGGRAPH 98 course notes.
[14] Mark J.Kilgard. “Improving Shadows and
Reflections via the Stencil Buffer”, nVidia white
paper.
[15] Foley, van Dam, Feiner and Huges. “Computer
Graphics. Principles and Practice.” ISBN 0-201-
84840-6.
[16] Tomoyuki Nishita, Eihac hiro Nakamae. “Method of
Displaying Optical Effects within Water using
Accumulation Buffer”
[17] Michael Kass and Gavin Miller. “Rapid, Stable Fluid
Dynamics for Computer Graphics”. Computer
Graphics, Volume 24, Number 4, August 1990.
[18] Joe Stam. “A Simple Fluid Solver based on the
FFT”. Journal of Graphics Tools.
http://reality.sgi.com/jstam_sea/Research/pub.html
[19] Hugh D. Young. “University Physics. Eighth
edition”. ISBN 0-201-52690-5.
[20] Lasse Staff Jensen. ”Game Physics. Part I:
Unconstraint Rigid Body Motion”.
[21] Wolfgang Heidrich. “Environment Maps And Their
Application”.
[22] Jonathan Cohen, Chris Tchou, Tim Hawkins and
Paul Debevec. “Real-time High Dynamic Range
Texture Mapping.”. Eurographics Rendering
Workshop 2001.
[23] Alain Fournier and William T. Reeves. “A simple
model of Ocean waves”. SIGGRAPH 1986
Proceedings.
URL: http://www.gamasutra.com/20010912/sterren_01.htm
This paper discusses AI terrain reasoning for 3D action games. Terrain reasoning is the AI
capability to take into account terrain in his planning, decisions, actions and communication.
Terrain reasoning is a necessity if both AI and terrain play an important role in the game. And
that often is the case in 3D action games.
Level designers spend weeks designing a challenging game world with well-thought out battle
arenas, multiple access routes, risky approaches to highly valued power-ups, defensive
strongholds, and neutral zones.
However, the traditional action game AI was unaware of all the thoughts, efforts and design
that went into the level. Instead, he ran around almost blind, via shortest paths and sometimes
guided by a few static hints from the level designer. "Don't ask me, I only work here!"...
This paper presents terrain reasoning based on the waypoint graph typically present in 3D
action games. It discusses terrain reasoning concepts, and how the waypoint graph can be
reasoned about. The paper discusses and demonstrates the relation between tactics and
terrain. A case-study shows how to develop off-line and in-game reasoning to pick good sniping
spots, and how to use game-play feedback to create adaptive AI with tactical understanding of
the terrain.
A short threat prediction example demonstrates one of other ways to reason about the terrain.
The paper also addresses the major issues in implementing such a terrain reasoning system.
Today's action games offer plenty of situations where better AI understanding of the terrain
would be valuable to the gamer and level designer. For example:
z Recognizing key terrain features and communicating about them.The player issues a
"provide suppressive fire" command while pointing to a door in a distant building.
Typically, it is not as obvious to the AI as it is to the player that a door is pointed to
rather than the wall next to it. But the player expects to hear the AI confirm his order:
"roger, will provide fire on that door".
z Distinguishing between good and bad locations. All locations are not created equal. Some
locations are great to ambush other actors, whereas other locations are not. An AI actor
who picks the right location for an action simply is more convincing
z Interpreting location based performance. After being lured into the same narrow tunnel to
receive a rocket for the 3rd time in a row, it would be great if the AI would recognize the
tunnel as a bad place to be. The AI should adapt its tactics accordingly, instead of trying
to dodge the rocket without the space to do so.
z Automating part of the level annotation. The level designer favorite job probably isn't
manually editing dozens of waypoints to tell the AI "camp here", "avoid this location",
"here's some cover". An AI capable of automatically annotating the level itself will save
the level designer time. And AI provided feedback on its interpretation of the terrain
might be welcome as well. Automated AI interpretation not only is useful for the
professional level designer, but even more for the amateur level designer who is less
familiar with the needs of the AI.
z Including the terrain in tactical considerations. Few things are as exciting as engaging a
squad of AI actors who coordinate their actions, and alternate suppression fire with
grenade lobbing. Part of that excitement disappears if the same squad applies those same
tactics at a less suitable location: the AI should understand that attacking from, for
example, an elevator is different than from a warehouse full of crates.
Terrain reasoning may be an important ingredient to address the issues, but it cannot do so
without enhancements in other parts of the AI, such as the planner, state machines, combat
rules, etc. These other parts of the AI have been receiving plenty of attention in literature.
The amount of terrain in an action game makes it infeasible to efficiently handle the terrain in
raw geometry format, or by means of a rule bases or neural nets: both the size and the
geometric detail present in a level prevent this.
For those reasons, terrain reasoning should be addressed with custom solutions, tuned for the
specific AI needs and level characteristics of the game.
A popular and custom means to describe terrain are waypoints (and similar concepts such as
cells or grids - see [Reece], [Snook]). Waypoints represent the subset of the terrain accessible
to the player and AI. The connections between these waypoints denote viable movement. And
the graph created by the waypoints and their inter-connections expresses the valid paths.
Often, these waypoints are annotated with the presence of nearby power ups. Sometimes, the
waypoints are part of more abstract concepts like areas and portals. Thus, waypoints are a
handy and versatile terrain representation. Nevertheless, the AI typically ignores to reason
about them (except for the occasional path finding).
The remainder of this paper discusses how to reason about terrain based on the waypoint
graph. While the ideas result from developing tactical squad and individual AI for tactical action
game, the concepts apply to a wide range of action games.
Waypoint Graphs
Waypoint graphs convey a lot more information than just the shortest path from A to B. For
example, the waypoint graph in Figure 1 below suggests terrain that is composed of three
rooms, connected by a few hallways.
Waypoints expressing the shape of a level (left), and
waypoints in-game (right)
All waypoints together not only communicate the shape of the accessible terrain. If enhanced
with line-of-sight and line-of-fire information, waypoint graphs can be used to express many
ingredients that make up tactics.
In 'capture-the-flag' (CTF) team games, the team able to identify and defend the bottleneck
areas near the home base will stand a good chance. In a tactical shooter, the squad that
manages to reach the objective via an concealed approach will have good chances of surprising
the enemy.
In all these examples, terrain plays a big role. Actually, the relation is often big enough to
enable tactics ingredients to be expressed in terms of waypoints and their relations.
Look at the game terrain, pick two locations, and ask the question: Why is one a better (or
worse) location than the other for a specific tactic?
The bottleneck area in CTF is typically on the access route to the base. Access routes can be
identified from the number of paths to the base that pass through that area. The bottleneck
area also offers little cover from observation and incoming rockets. Expressed in waypoints,
that means that waypoints in or near the bottleneck area all have a line-of-sight and line-of-fire
from a number of waypoints in the base area.
A concealed approach consists of a path to the destination that cannot be observed for long
periods from locations near the destination. In other words, solely a small number of path
waypoints can be observed from waypoints near the destination, and it never involves a long
string of consecutive waypoints. Note that the quality of a tactical expression in terms of
waypoints depends on the density of waypoints: the denser the waypoint graph covering the
terrain, the more accurate the expression (see also section 6.1).
The expressions also make assumptions about the effects of weapon, and player movement
abilities.
Waypoint Calculations
The direct relation between tactical concepts and the waypoint graph enables us to turn these
concepts into computable properties. For meaningful results, those computations should
resemble the human tactical interpretation.
A waypoint has four different kinds of characteristics that all should be taken into account:
z The local environment. Waypoints describe, for example, how dark the location is, which
types of movement are required (crouching, swimming, using a ladder), whether a door
or button entity is present.
z The membership(s) of higher-level terrain concepts. Waypoints may be grouped to
represent terrain concepts such as a room, a lake, or the blue team's base.
z The relations with other waypoints. Waypoints have relations with other waypoints: is
there a valid line-of-sight, how long does it take to get from one waypoint to the other.
z The focus (in the relations with other waypoints). A waypoint whose relations, such as
line-of-sight or easy access, are clustered primarily in a single narrow direction has focus
in that direction.
Of all the four characteristics, focus, is the most complex to grasp. The following case studies
will illustrate why it is needed.
Terrain reasoning based on waypoint graphs, its value, and the concepts introduced here are
best explained via an example.
This section discusses a concrete case, 'picking good sniping spots', in detail and illustrates how
to translate tactics such as "sniping" to waypoint graph properties, how to create an algorithm
that turns these properties into a waypoint sniping suitability score, how to improve the pre-
computed sniping scores with sniping performance feedback collected in-game (thus learning),
and how to adapt the algorithm to include game play experience in its tactical considerations
(thus adapting to game tactics).
The player points to a building some one hundred meters in front of him, and
summons the nearby AI actor: "get into position to snipe that building". The AI
establishes the direction to the building.
The AI determines all waypoints that are within 2 seconds distance of him. He
quickly ranks these spots for "sniping" suitability towards the target building. For
the most promising spots, he checks if a clear line of fire is available.
Once a candidate spot has been found, the AI checks if it good enough for sniping
purposes. The AI then reports back to the player whether he is able to execute the
instruction. If so, he moves into position.
The Terrain
A rather simple and small level serves to illustrate the terrain reasoning (whereas the reasoning
can deal with any terrain size). The level consists of:
z Two buildings on opposite ends of the terrain, each having two windows with a view to
the bridge, and a rotating door
z A creek separating the two sides
z A bridge, providing a fast way to cross the creek, and offering some cover (half-height
walls)
z Two stairs, enabling players to leave the creek. Both stairs provide some cover (half-
height walls)
z Two full-height walls, providing additional cover.
In this example, assume that the AI sniper is in the northwest corner of the level, and is
instructed to snipe towards the building in the northeast.
The following qualities make for a good sniper spot in a tactical shooter game:
A directional property (reflecting a relation between waypoints), "in the direct surroundings of
the spot, the sniper can find cover from threats at overlooked locations," can be translated to
"For all waypoints overlooked from the sniper spot waypoint: given a presumed threat at the
overlooked waypoint, an easily reached waypoint that offers cover from such a threat is
available near the sniper waypoint"
Even a complex property like focus can be easily tranlated: "The spot has focus, that is, it offers
a great view in one single direction, in combination with one or more protected flanks" can be
translated to "the ratio of waypoints visible in one direction vs. waypoints visible in other
directions"
The following table lists the terrain properties used to express the sniping spot qualities:
A directional property, such as, “b: overlooking many distant locations “ is computed as follows:
k × Σ ( line-of-sight-line-of-fire (w , w’ ) × fardistance (w , w’ ) )
_________________________________________________________
# waypoints w’’ with a line-of-sight-line-of-fire (w , w’’ ) > 0
focus (w ,d ) =
Focus here is the ratio of forward interactions versus non-forward directions.The role of focus is
best illustrated using the example terrain. Positions that have a strong eastbound focus are:
z the positions at the western border, because they only have relations with waypoints to
their west, and typically with a lot them;
z the rooms in the southwestern building, and the small alley south of that building have a
very strong focus, because their flanks are protected;
z the positions in the northwest, west of the full height wall, the positions on the northern
stairs from the creek, and the positions in western side of the creek also have a good
eastbound focus, because their back (to the west) is covered.
Left: East-oriented focus (darker color is higher focus) for all locations,
illustrated as a top view, along with terrain feature overlay Right: AI computed
Sniping Spot preferences (darker is better) in the eastward direction, using
static level data only
However, it is also a disadvantage to use only these few inputs because locations not properly
represented by a nearby waypoint are ignored and actual game play is ignored.
The first problem is easily addressed by adding more waypoints at and near key locations: with
more waypoints, the AI is less likely not to detect a position offering cover.
The second problem shows in the rating of sniper spots on the western creek bank: these
positions receive a rather high rating because they overlook spots in the creek from which it is
hard to avoid the sniper. However, in most game scenarios, most activity will not occur in that
creek. Consequently, the sniper spots on the creek bank are overrated.
Luckily, the lack of game play input is not only easily addressed, but also creates a great
opportunity.
Capturing and analyzing the required game play data for this case is done by recording, at
every way point, the time spent by the AI or player while static at that position and with sniping
weapon active, the amount of damage caused by the sniping weapon from that position, and
the amount of damage taken when sniping at that position.
sniping experience at ( w, d ) =
player sniping time in direction ( w, d ) + AI sniping time in direction ( w, d )
+ player directional damage issued ( w, d ) + AI directional damage issued ( w, d )
- player directional damage taken ( w, d ) - AI directional damage taken ( w, d )
and adjust the pre-computed waypoint sniping rankings with the experience:
sniping quality ( w, d ) =
precomputed sniping quality ( w, d ) + sniping experience at ( w, d )
Collecting the game play data takes negligible time, especially since most games already
determine the nearest waypoint for every actor at every game tick (for navigation purposes).
Processing also takes little time, and typically can be done when presenting an intermission
screen.
Because the AI can adapt its sniping spot rankings 'on-the-fly', it will display varying and
adaptive sniping behavior. For this example, some 40 game rounds were run, with one or two
snipers defending the western building, and three or four attackers storming that building from
the east side of the level. The player, randomly, acted as a defending sniper or one of the
attackers. The figures below illustrate the "sniping experience data" collected in-game, and the
updated waypoint sniping ratings (in the east direction):
Left: Waypoint sniping experience data (non-directional, darker is better)
Right: Eastward sniping ratings adjusted with the experience (darker is
better)
Note that the sniping experience data in the images above (left) largely corresponds with (and
thus confirms) the sniper ratings computed from the waypoint graph. However, the experience
data also emphasizes several highly successful sniper spots over other potential sniper spots.
Adjusting the waypoint sniping ratings results in new ratings. These ratings show a decreased
importance for the southwestern creek bank, and increased importance for a few northwestern
spots partially covered by the full height wall there.
Game play does not occur evenly across the map. Some locations are more frequently visited
than other locations. And sniping spots overlooking many of these "traffic" locations thus are
more important. Again, it is easy and cheap to record actor and player movement per waypoint.
The formula for sniping rating needs to be extended with yet another factor, the ability of the
sniper to overlook much of the traffic (preferably at some distance from its spot, because then
the sniper has an advantage).
Re-running the algorithm, now taking into account both sniping experience, and traffic data
results in the following output as is shown above (right).
The new results again show reduced sniping ratings of the spots on the southwestern bank near
the creek. They also show increased ratings for the whole western edge of the map (because
these positions overlook many traffic spots).
Processing Steps
The AI need for the waypoint sniping ratings, as discussed in the scenario above, has been
addressed with:
z tactics based algorithm to rate sniping spots using the waypoint graph;
z an on-the-fly procedure to adapt the sniping ratings to game-play feedback;
z (optionally), a re-run of the algorithm to put more tactical understanding of the game
play in the ratings
The in-game memory consumption can be as small as two tables containing, for each waypoint
and each relevant direction, the sniping rating and player and AI activities respectively. The in-
game CPU consumption consists of tracking player and AI activity per waypoint, and a brief
computation to adjust sniping ratings (typically done when presenting an intermission screen).
To preprocess a level, the memory and CPU requirements are more demanding. The tables for
shortest paths, line-of-sight, line-of-fire easily consume a few MB, and the algorithm takes a
few tens of seconds to execute.
There are several alternatives to create good waypoint sniper rankings, as is shown in Figure 8
above.The approach presented enables the game developer to ship his game with an
"experienced" AI that has gained some tactical understanding of the game's levels.Optionally,
the gamer can be allowed to run the algorithm using on his game play data, so the AI adapt to
his style of playing.
Again, these properties can be mapped onto waypoint related properties. For ambush positions,
it is also simple to capture 'ambush' related game play data, and use that to adjust ambush
ratings on-the-fly, or with a re-run of the algorithm.
Another problem that is very similar is identifying good positions to position sentry guns.
A more generic waypoint 'combat suitability' rating is also valuable. Such a rating will help the
AI distinguish between threats that occupy strong or weak positions. If the game AI implements
flocking behavior, that waypoint 'combat suitability' rating can be used to stimulate tactical
movement.
In a flocking algorithm, squad members determine their position based on attracting forces (for
example, stay near the team members) and repulsing forces (stay away from walls, do not get
too close to team members). Often, tactical considerations are ignored. By using the generic
'combat' rating as additional force (attracting or repulsing the squad member), concepts such
as nearby cover, good overview, and flank protection can be handled as part of the flocking. As
a result, the squad will be more likely to move along walls than to move in the open.
In combination with areas and portals, the combat rating of areas and portals can be
determined. To do so, for each area, the combat ratings for waypoints are determined using
solely the interactions with other waypoints outside the area. This enables the AI to detect
strongholds.
Terrain reasoning has many other applications, for example dealing with dynamic situations.
The following threat prediction case illustrates the use of waypoint graph based reasoning for
dynamic situations.
To be a useful team member or challenging opponent, the AI should understand where threats
are likely to appear and where threats are likely to move to. Though it is often infeasible to fully
model the presumed motives and behavior of threats, it is possible to do a good job using the
terrain representation.
The following two-step, waypoint graph based approach works quite well. First predict the
threat's position using:
Typically, the threat's position is extrapolated, under the assumption that the threat continues
in the same direction and with the same speed. The resulting position is tested against both the
paths feasible, and the line-of-sight. For example, the threat will not move through walls, and
in the waypoint graph, the absence of path indicates that. Similarly, if the AI does not see the
threat at a predicted location that is in full view, the extrapolation is known to be incorrect. In
these cases, the presumed threat position can be corrected to reflect a (more) likely location,
for example the last valid extrapolated location.
Second, the AI needs to decide where to aim for. If the AI cannot fire through obstacles, it
solely has to consider the visible locations in the immediate surroundings of the presumed
threat position. One simple way to pick a likely location for the threat to reappear (and thus a
location to aim for), is to construct the shortest path from the presumed threat position to the
AI actor. The first location on that path that is within view of the AI, is a good location to aim
for. The shorter the time the threat needs to travel to that location, the more likely
If the threat's likely reappear location is quite close to the threat's presumed location, the AI
might consider to employ suppression fire or launch a rocket towards that location.
Even non-trivial reasoning, such as determining the value of spending a hand grenade to attack
or flush out a threat, can be done efficiently provided the data structures are designed to
support such a query (Sterren).
Terrain reasoning means reasoning about tens of thousands of polygons, and millions of inter-
waypoint relations. Implementing terrain reasoning thus is a matter of carefully analyzing the
AI requirements, the resource constraints, making trade-offs, and executing some experiments.
z fewer waypoints mean less work to the level designer (to place them);
z fewer waypoints mean faster pathfinding and smaller shortest path lookup tables
However, fewer waypoints mean also a coarser representation of the game terrain: each
waypoint (on average) describes a larger part of the terrain, and the line-of-sight and travel
time relations becomes less accurate.
The waypoint density required to sufficiently describe the terrain depends on the game design
and AI needs. For games featuring lethal weapons ("single shot kills"), slow movement, many
obstacles, and lack of respawns, the terrain representation should describe as many covered
and concealed spots. This typically translates to a high waypoint density. In games featuring
respawns, fast movement, and weaker weapons, a low waypoint density suffices, since terrain
details are less important (but the tactics still may be terrain driven, because of power-up and
objectives positions). Note that algorithm such as the one used for identifying good sniping
spots assumes a more or less uniform distribution of waypoints across the terrain.
Now consider the following simple AI query to locate nearby cover from a number of threats:
Given three threats that have line of fire to the AI actor, is there a spot within 3 seconds
distance that provides cover?
To determine the result, the AI needs to construct numerous paths and perform a good number
of ray traces (taking into account different actor postures, such as sitting and prone).
Today's CPUs and game engines handle in the order of 3,000 A* path searches per second, and
50,000 trace lines per second, depending on level size and geometry complexity. Given an AI
budget of 10 percent, and 10 AI ticks per second, the AI is allowed some 15 path searches and
250 trace lines per tick, for all its actors (typically 4 to 30 of them), and as its sole activity.
Storing part of the computations and looking up the (intermediate) results typically is three
orders of magnitude faster for path finding, and two orders of magnitude faster for trace lines.
Using lookup tables thus frees up CPU time for more advanced AI or faster graphics.
However, lookup tables may consume a good amount of memory. Especially for waypoint-pair
relations, such as paths and line-of-sight, the straightforward matrix implementation has O(N x
N) space complexity. For N ≥ 1000, these lookup tables really start consuming megabytes.
For path lookup, a hierarchical or multi-level path structure in combination with a cache for the
most recently referred to paths (and travel times) offers a smaller footprint while still being
very efficient.
The average waypoint sees only a small subset of all other waypoints. In that case, the line-of-
sight matrix can be flattened: per waypoint, record solely the waypoints actually seen by that
waypoint, and the waypoints able to see that waypoint.
Another concept that helps selecting the right content for look up tables is the so-called "time
horizon". Most of the AI decisions, especially under the resource demanding "combat"
conditions, have a limited scope. The game world is so dynamic that planning beyond a horizon
of one to three seconds (depending on the game) has little value.
The AI queries therefor limit themselves to the surroundings accessible within the time horizon
(and upon arriving there, they will decide again). A per waypoint look up table that contains the
info for all waypoints within that time horizon will be very efficient without being large.
Apart from checking for changes in the environment, the AI can also try to patch the pre-
computed terrain description. The lines-of-sight affected by a door can be pre-computed, and
the changes can be applied on the fly when the door changes state. Another option is to have a
background threat slowly update the information.
To create an AI that adapts quickly and tactically to the player's tricks, pay some attention to
issues like attaching more significance to feedback from player activity than from AI activity
because the player will be more creative, and have a better tactical understanding of the game
and terrain. The player activity should be valued more than the combined AI activities.
This policy also prevents the AI from honing into its own flaws (for example, the AI repeatedly
and unsuccessfully assaulting an area because the area is known for the many frags scored
there).
P eriodically, level all values to be more responsive to changes in tactics. After re-computing
sniping spots using game play data, the player might see changes in the AI sniping and adapt
his tactics. To be more responsive to changes in the player's tactics, the aggregated game play
activity values should be leveled. Using a function such as log or sqrt, this can be done while
preserving the ordering.
To efficiently develop and tune a terrain reasoning algorithm, the algorithm is best created
incrementally, and using feedback from visualizing its results. Luckily, it is easy to visualize and
interpret terrain and terrain properties. Preferably, the results are visualized as an overlay in
the rendered 3D world. For tuning, an export facility to comma separated files and a spread
sheet package are recommended.
Summary
Terrain reasoning is the AI's ability to include terrain and related concepts such cover, travel
time, areas in its plans, decisions, actions and communication. Notably in action games where
AI and terrain play a big role, effective terrain reasoning results in a richer game AI. Terrain
reasoning involves a terrain representation, and algorithms to create, manipulate and query
that representation. Waypoint graphs, until now primarily used for navigation, are
demonstrated to be a powerful means for terrain reasoning.
Many action game tactics use the terrain. These tactics then can be related to properties of the
waypoint graph. With formulas to express these waypoint graph properties, the tactical
concepts such as sniping spots, ambush locations, and strongholds can be computed and
identified by the AI.
The reasoning about tactical concepts is easily extended to include player and AI game behavior
local to the waypoint. This game activity data enables the AI to adapt its tactics to actual game
play. Adaptation occurs because of re-enforcement learning, and because of improved
understanding of the role that various locations play in the game.
The waypoint based terrain reasoning can be used in-game, for example to interpret threat
positions and select the appropriate tactics. This kind of terrain reasoning can also be used to
pre-process terrain in order to off-load and assist the run-time AI.
Acknowledgements
Jan Paul van Waveren (id software), Doug Reece (SAIC), and the numerous talented bot
developers in the Quake, Half-Life, and Unreal bot communities for valuable discussions.
The Action Quake II community for their help in turning that "mod" into a custom designed
tactics testbed for AI experiments.
Campbell, C.E., McCulley, G., "Terrain Reasoning Challenges in the CCTT Dynamic Environment"
in the 5th Annual Conference on AI, Simulation, and Planning in High Autonomy Systems, IEEE,
1994
Laird, J.E., "It knows what you're going to do: Adding anticipation to a Quakebot" in AAAI 2001
Spring Symposium Series: Artificial Intelligence and Interactive Entertainment, March 2000,
AAAI Technical Report SS-00-02
Reece, D., Krauss, M., Dumanoir, P., "Tactical Movement Planning for Individual Combatants",
in Proceedings of the 9th Conference on Computer Generated Forces and Behavioral
Representation, 2000
Reich, A.J., "An Efficient Representation of Spatial Data For Terrain Reasoning By Computer
Generated Forces", in Proceedings of the ELECSIM95, SCS, 1995
Snook, Greg, "Simplified 3D Movement and Pathfinding Using Navigation Meshes" in Game
Programming Gems I, Mark DeLoura (ed.), Charles River Media, 2000
http://www.gamasutra.com/20010912/sterren_01.htm
We were looking for an optimal way in which to build our levels from inside 3D Studio Max.
Lofting and other techniques used earlier to make level meshes were rigid to changes. We
needed a method where changes to the level mesh could be incorporated without too much
effort. We wanted to have a system within Max which had the ease associated with CSG
(Constructive Solid Geometry). When I saw the way Max could boolean objects, especially even
objects which were non intersecting, I thought I could make use of this feature in Max to build
and derive meshes which were like the ones a CSG builder would probably output.
This article does not discuss the actual script functions that I used, but will explain the logic and
the way I have used Max script to build a small system within Max to build game levels for our
game engine. One could always adapt and create scripts that meet their own needs.
It is assumed that the reader has a basic knowledge of working with 3D Studio Max and Max
Script. The article covers terms such as boolean, axis, face normals, etc, which the reader is
presumed to have knowledge of. A more detailed description of these terms can be found in the
3D Studio Max help files.
Using Max Script to Build a Game Level Building System Inside Max
The first and foremost task is to understand what your game engine exactly requires; whether
the level mesh can be just a collection of polygons mapped with different textures, or if they
have to comply with certain criteria necessary for it to be used by your game engine. It can
vary depending on how exactly your game engine wants the level data to be organized. Most
game engines prefer the world data to be built or organized in a particular way.
In our case, the game engine needed the level mesh to be BSP (Binary Space Partitioning)
friendly. The mesh exported from Max had to fulfill two basic criteria. First, every edge in the
mesh should be two manifold, and second no face should intersect another. If the mesh fulfills
these two conditions then there will be no cracks or gaps in it.
In addition to our game engine, we also have our own lighting editor, which we use to light up
these game levels that we create. We also place other special effects such as fire, smoke, etc
within the lighting editor. The lighting editor also needs the mesh to be BSP friendly in order to
calculate lights faster. Moreover, it requires the mesh to have light maps assigned and mapped
to them, which it fills when the level is being lit.
Defining Primitives
As mentioned earlier, we wanted our system to have the ease of a CSG level builder. We
adapted Boolean functions inside Max to create an environment like CSG. The level building
starts with defined primitives. Later using Boolean functions, we derived a level mesh from
these primitives that is BSP friendly. The Boolean operations are covered in the second part
'Building the World'.
One could probably put in more time and use a BSP tree to build the level mesh from these
primitives. A person building a game level using this in Max, starts with primitives like in a CSG
level builder. The actual world is built or updated every time the user presses the Build World
button.
First, we have to define the primitives that the user can use to build the world. A check on other
CSG editors like Genesis and Unreal Ed, gives a fair idea on the various primitives that they
use. Since the script does Boolean operations on the primitives, the primitives should support
Boolean operations without any errors. Almost all the Max standard-objects such as the Box,
Sphere, Cylinder, Pyramid, Cone, Tube, and Torus worked well with Boolean operations. Some
of the Extended Objects are useful too and can be booleaned properly.
Since the user wouldn't know which primitive could be used and which couldn't, a customized
GUI can be made which has all the primitives that could be used, as buttons. While building the
system, Max functions, features and even standard buttons have been adapted wherever
possible. However, creation of all the primitives using custom scripts is preferred, since it would
give more control over the creation of primitives and assignment of various properties to them.
In addition to these primitives, one could make their own, which they deem as useful. Four
primitives, a ramp, ramp with railings, stairs and stairs with railing were made for this system.
This is covered later in the article under Creating Custom Primitives.
Building the World
The user building a level mesh using this system in Max starts building primitives. The
primitives are assigned appropriate materials and textures. The primitives can even have
different materials assigned to different faces.
The primitives can be either hollow or solid like in a CSG builder. Through script, a user
property called bool is set for each of the primitives. This bool property saves either one or two
against it. One stands for hollow primitive and two stands for solid primitive. By default, the
primitives created are set to hollow.
After creation, these primitives have to be added into the scene by clicking the Add CreateID
button. This script assigns a CreateID user property to the object. The CreateID property stores
a value against it, which determines the order of creation of these primitives. This order is
needed while doing Boolean operations. The user can now press the Build World button to build
or update the level mesh.
When the user presses the Build World button, the script first searches through the scene to
find and collect objects that have a CreateID property. Then it runs through the collection of
objects or primitives as we refer to them, and sorts them in ascending order according to the
value stored against the CreateID property. This is important since Boolean functions could end
up with different results depending on the order in which the objects are booleaned.
At first the script was depending on Max, since internally, Max seemed to have some kind of
order which matched the order of creation of these primitives. However, when any primitive
were hidden or frozen and later unhidden or unfrozen the order seemed to change. Hence, the
Assign CreateID button was introduced which determined the order of creation of the
primitives. This later made the system much more flexible, since the user could change the
order of the primitives at any time by using a button called Force CreateID. The Force CreateID
script let the user assign any CreateID for the primitive thereby enabling the user to move the
primitive up or down the order of creation.
Getting back to the Build World script, once the primitives are in order, the script starts by
creating an object named Parent. Henceforth, this object will be referred to as the parent. The
parent is basically a box constructed through script. The total bounding area of all the primitives
is ascertained, and the parent size is set to be slightly bigger than this. Then the parent is
positioned such that it covers all the primitives. This is done because in most CSG environments
one starts with solid space and starts carving out hollow spaces inside it. The parent object,
being a standard solid box in Max, is much like solid space in a CSG environment.
Once the parent is created, each primitive is taken in the order in which they now exist in the
collection of primitives, and booleaned with the parent. While booleaning these primitives with
the parent, the script checks whether the primitive being booleaned is a hollow or solid
primitive. This property is received from the 'bool' property of these primitives. All primitives
that do not have this property against them are set to be hollow as default. The Boolean
operation is set to subtraction if the primitive is hollow while booleaning it with the parent. If
the primitive is solid then the Boolean operation is set to union.
Moreover, while booleaning the primitives, the pick object is set to be an instance of the actual
primitive being booleaned. This way surprisingly even after multiple booleans, any mapping
change done to any primitive used in boolean operations, affect the mapping in the booleaned
object or the parent in this case. This is very useful since if the user later changes the mapping
on any of the primitives it is instantly updated in the parent object.
The script also does a weld-on-threshold on the vertices of the parent after every Boolean. This
is done to ensure that there are no open edges in the parent object. If the parent develops
open edges after any Boolean operation, further Boolean operations end up with undesired
results. It is also better not to have any open edges if you want the mesh to be BSP friendly.
The code for this appears in Listing 1.
Once all the booleans are done, the parent is almost complete. Now, all the edges of the parent
object are set to invisible. Then all the primitives are set to Box mode. This is a small trick to
ensure that only the parent object is visible in a shaded viewport and only the primitives are
displayed in the viewports set to wireframe.
By doing this, the Building of the world is complete. We now have the primitives and the parent,
which is the built world.
When the user presses the button Build Export Mesh, the script file first makes a copy of the
parent object, which shall be henceforth referred to as parent2. Then it checks for the
primitives in scene that are not hidden and collects them into a list, which we shall call as the
'primitive list'. After which it hides everything in scene except parent2.
Parent2, being a booleaned object, will have a multi material assigned to it. The script breaks
the faces in parent2 according to the sub materials being used. Then these detached meshes
are reassigned with the appropriate materials. The code for this appears in Listing 2.
After this, each of these objects is further broken into smaller planar objects. The explode
function in Max is used by the script to do this, hence the explode threshold determines the
angle for breaking up into smaller objects. This occurs because a planar mapping is done on
these objects later, for use by lightmaps. The code for this appears in Listing 3.
When the mesh is broken into planar surfaces, some of these planar surfaces (mostly the floor)
turn out to be huge objects that stretch across many primitives. We call these objects rogue
objects. These rogue objects require huge lightmaps, and chunks of this lightmap would go to
waste due to many unmapped areas. Therefore, these objects are checked and broken down,
such that the broken objects do not stretch across more than one primitive.
To do this, all the exploded objects are taken. Their bounding boxes are checked against the
bounding boxes of the primitives that exist in the primitive list. If the object's bounding box
does not lie within the bounding box of any single primitive, it is a rogue object.
Once all the rogue objects have been identified, each of these rogue objects are taken and the
bounding box of their individual faces are checked against the primitives from the primitive list.
Sets of faces that lie within a single primitive are taken and detached into separate objects. At
the end of this process, each detached object falls into a single primitive.
Once the objects are broken down, each of these objects is taken and applied mapping co-
ordinates on the second channel. The second channel is used because the first channel has
already been used to map the diffuse texture.
At this point, I would like to thank Simon Feltman for the wonderful plugin Multimap.dlx, which
he has written. Without this plugin, I wouldn't have been able to access the 2nd channel
mapping co-ordinates in Max.
In order to apply mapping co-ordinates to the objects, first the face normal of each object is
determined. The script then checks which world axis is closest to the face normal (this will be
referred to as the 'closest axis'). After checking, the object is applied with mapping co-ordinates
perpendicular to this closest axis. Following this, the mapping co-ordinates are fit to the object
extents, such that the mapping co-ordinates extend from 0 to 1. The code for this appears in
Listing 4 and Listing 5.
Once the object is mapped with mapping co-ordinates on the 2nd channel, the right sized
bitmap has to be created and assigned to the object as a lightmap. For do this, the area
covered by the object along the closest axis is determined. This area is scaled by a factor to get
the size in pixels. We'll term this as the object area in pixels. The user, while building the mesh,
can set this factor. It gives the user control over the resolution of the lightmaps that are
created. A large factor will yield small lightmaps and a small factor will result in large lightmaps.
The bitmaps that are created have to be square, and their height and width in pixels have to be
a power of 2. The script then checks for the optimal bitmap size starting from the smallest in
order to accommodate the 'object area in pixels' that was previously calculated. The code can
be found in Listing 6.
Now the right bitmap size has been ascertained and the object has been mapped on the 2nd
channel. However, the object mapping extends from 0 to 1, i.e. the entire bitmap area,
whereas the actual object area in pixels, occupies only a region in that bitmap. Therefore, the
mapping co-ordinates are scaled to fit the 'object area in pixels' in proportion to the bitmap size
that has been ascertained.
After scaling the mapping co-ordinates, a new material is assigned to the object, which is
basically a copy of the original material that the object had. The newly created blank bitmap is
assigned as the lightmap by assigning it to the selfillum channel of the object's material. The
set of objects at the end of this process will have proper textures and lightmaps mapped and
assigned to them. Since every broken object gets a lightmap, it results in a huge number of
lightmaps being used by the entire level mesh. Hence, these individual tiny lightmaps have to
be clubbed into bigger ones.
To club these lightmaps, a proper criterion has to be determined and used. The script uses the
original primitives that the objects resulted from in order to club their lightmaps. By doing this,
the number of lightmaps should be equal to the number of primitives that were used to build
the level mesh. The user also has the control to club them further by assigning a common ID to
sets of primitives. The user can assign a user property called ClubID to the primitives before the
export world is built. Sets of primitives are assigned a common ClubID value. When the script
clubs the lightmaps, it also checks for primitives with the same ClubID and clubs the lightmaps
of the objects accordingly.
To do all this, the objects have to first be linked to the primitives that they resulted from. Each
primitive is taken, and all the export objects are checked against the primitive. The bounding
extents of each export object is taken and checked if it falls entirely within the bounding extents
of the primitive. Though this produces good results, some objects resulting from primitives that
in turn fall totally into a bigger primitive, could be linked to the bigger primitive.
To avoid this, all the planes in the primitive are taken and collected into an array or collection
called arrplanes. Then, the first face of each of the export objects, which fall into the primitive's
bounding box, are taken and checked against the planes in the arrplanes array. The object has
resulted from that primitive, if all three vertices of this face lie on any of the planes in the
arrplanes array. Such objects are assigned the CreateID user proper of the primitive. This is
because any object that was created by breaking the parent or parent2 should have faces which
lie on one of the planes of the primitive that they resulted from. The code for this appears in
Listing 7.
Once the links are completed, the script begins clubbing the lightmaps. A script function for
clubbing these lightmaps has been made. When called, the function automatically determines
the correct clubbed bitmap size for the lightmaps to be clubbed locates and remaps the
mapping co-ordinates to vacant areas in the clubbed map and also reassigns the selfillum map
of the object with the newly clubbed lightmap. It also provides a one-pixel edge to these
mapped areas to accommodate bleeding from adjacent pixels, which is caused when bilinear
filtering is used in the engine.
Once the clubbing is complete, the mesh is ready for export. All the objects are then grouped
into one group called Parent_world_for_export, so that it is easy to select the group and export
into the game engine. In case the user needs to modify or extend the game level, all they have
to do is delete the group, unhiding all the primitives and the parent object that are currently
hidden, and work on with the level mesh.
In addition to this some other properties are also set on these export objects such as ClubID
and Ambient colour, which are all derived from the Primitives that they fall in.
The tower_plugin definition that can be found in the Max script help has been modified to build
the custom primitives. The 'Stairs with railing' that is shown in figure3 is one such custom
primitive.
Figure 3: A custom primitive 'Stairs with railing'
These custom primitives are very useful, and can give you good results when used with other
Max modifiers.
Advantages
With this system in Max, one can build a level mesh with the ease associated with CSG
modeling. If you compare this to building low poly level meshes in Max using loft or actually
building faces, this method is far easier to create and more flexible to changes. Because the
user makes the world using CSG primitives or brushes, he can easily make any changes to the
primitives at any point of time. The resulting level mesh is built by script, and therefore cuts out
manual work significantly.
All properties are stored against the primitives. Applying materials and mapping texture co-
ordinates, are all done on the primitives. Hence, when the level mesh is built no property or
work is lost. Moreover, since we are working inside Max, we can use Max modifiers like Bend,
Taper and Skew on the primitives. Max modifiers are easy to use, and some of them may not
exist in other CSG builders.
Disadvantages
Here are some of the disadvantages or areas of improvement in the present system:
z The script currently depends on Max booleans while building the world from primitives.
Therefore, it does not have complete control over the creation of the actual faces and
vertices. Because of this, later on while building the export-mesh, the script has to check
and link back the objects for export with the primitives.
z As the level mesh increases in size, building the export-mesh takes more time, especially
when you want to club the tiny lightmaps into larger ones. Although this is faster than
creating and mapping lightmaps manually, it could help speeding it up a bit.
z Though mapping changes to the primitives are reflected immediately, new materials
assigned to these primitives are not seen on the world mesh until you build the world.
z Sometimes, some faces in the built world get wrong mapping co-ordinates while
booleaning.
z Max script has no access to some elements inside Max. Two things that would have
helped the existing script while building the world, are access to the vertex weld threshold
spinner and the explode threshold spinner. Presently, the script file takes the threshold
that was last set by the user inside Max. It can't set the threshold to a set value.
A script or plugin to generate the world mesh through BSP instead of Boolean would probably
have more control over the actual faces and vertices being built. That should take care of many
problems. With a little more control given to Max script over the elements inside Max, the user
could get scripts that are more accurate.
When we started developing our own engine, we soon realized that we would need our own
level editor. We were already using Max to create and export our meshes into our engine using
custom plugin exporters. Since making our own editor would involve a lot of time and
manpower, we turned to adapting Max to meet our level building needs. At this point, I must
say that it has proved to be worth it. Eventually, we had our set of custom plugins and Max
scripts to build game levels in Max, which met our engine specifications.
I may be a little biased towards Max since I have been a Max user since the 3D studio 2 days.
However, I sincerely feel that Max is a very powerful tool, for not only building high poly scenes
and characters, but also for building low-resolution level meshes.
To start with, if you are looking to make Max your game level builder, you don't have to start
from scratch. You can select and use all the existing features in Max. For instance, the basic
Max UI itself is ready to use. The Max UI moreover is totally customizable. You can write
custom plugins and scripts, which a user can build game levels that are tailored for your game
engine.
Features like object-user-properties that Max already has are very useful, and can be used
extensively. The Unwrap UVW in Max is another widely used tool. When mapping low polygon
meshes, it gives you total control over its mapping co-ordinates. Max actually let's you work on
meshes at the basic vertex or texture vertex level, thereby giving you total control over the
mesh that you are creating.
There are so many features in Max that can be used or adapted to meet your needs, such as
material properties for surfaces in your level mesh, splines and bezier patches, which can be
used if your game engine supports bezier patches. For instance, we used bezier splines to
create and export camera paths and character paths into our engine. I couldn't possibly list all
the features in Max that are available because it's too vast and its utility would vary depending
on ones needs.
I have not had the opportunity to work on Max 4 or GMax until now. But from the all the
reviews about it, that I have been closely observing, I can see lots of goodies in store that can
be put to good use. First, the viewports claim to be WYSIWYG. One of the reasons for not
lighting the mesh within Max itself was that the Max viewports could display only one texture at
a time. With support for multiple textures, it could probably display textures as well as
lightmaps at the same time. Since it uses DirectX8, you should be able to blend or modulate
textures. You could even push other texture layers like noise and stains into the viewport
display inside Max. This would get the display within Max close, if not similar to what you would
see in your own game engine using DirectX8.
At present, we have our own fire and smoke particle systems. We can bring it all into Max,
since Max viewports now support true transparencies. This huge enhancement in the Max
viewport display itself is a big reason to consider using Max or GMax as your level editor.
Another feature that draws attention are object attributes. You can now set your own custom
attributes for objects, thereby making them unique and capable of fitting the specifications that
you want for your game engine. Although you can use object-user-properties to set and
retrieve properties on objects, custom-object-attributes would permanently attach these
attributes to every object in your scene.
Another important feature is ability to set individual properties for every face in your level
mesh. For instance, the existing script can now link and store information against each face to
the primitive that it belongs to, without actually breaking them apart.
Defining your own brushes is yet another big feature in Max4. A CSG level builder, for example,
can make use of this feature to define and work on its own custom brushes. With DirectX8, it
even supports per pixel and vertex lighting inside the Max viewports.
GMax is a subset of Max4 with a feature list that suits building game levels. Therefore, it should
be a much smaller program that focuses towards building game levels. Licensed programmers
will be able to define their own file types in addition to custom functions and plugins. Moreover,
since GMax will be freely downloadable a game player can download your game module and run
it on GMax to edit existing game levels, or to create new levels for your game.
There are hundreds of new features and enhancements in Max4 and GMax. The features that I
have discussed are significant for game level editing, which I feel can be put to immense use if
you plan on using Max4 or Gmax. There is so much in store included in the new version of Max
and Gmax, and it is up to us to adapt and use these features to their fullest.
http://www.gamasutra.com/features/20010824/watsa_01.htm
In Part One of this series I explained how to use a function based on Perlin noise and fractal
Brownian motion (fBm) to dynamically generate any point on a planetary body from a set of
coordinates. This article will focus on how to render those planetary bodies at real-time speeds
using my own spherical version of the ROAMing (Real-time Optimally Adapting Mesh) terrain
algorithm. For those of you who read Part One and downloaded the demo I provided, make sure
you download the latest demo. I've made some significant performance improvements since
then, and they are explained in this article.
There have been a number of articles published in the past few years on adaptive mesh
algorithms, which are also called dynamic or continuous LOD (Level-Of-Detail) algorithms.
Some good examples are "Real-Time Dynamic Level of Detail Terrain Rendering with ROAM"
and "Continuous LOD Terrain Meshing Using Adaptive Quadtrees". However, all the articles I've
read so far have drawn flat landscapes based mainly on a pre-built 2D height map. Using a little
creative thinking, I managed to come up with a spherical version that dynamically generates
height values as needed.
As a side note, when I wrote the initial version of this spherical ROAM algorithm (more than a
year ago), I couldn't find anything on the Internet regarding spherical DLOD algorithms. Since
then, a number of projects have popped up using various techniques. I still haven't seen
anyone do it quite the same way I did, but I'll provide a list of reference links at the end of the
article and leave it up to you to compare the pros and cons of various methods.
If you haven't read anything about ROAM, I recommend that you read the online paper:
"ROAMing Terrain: Real-time Optimally Adapting Meshes". For those who want to skip the gory
details, I'll go over the concept more briefly here. The ROAM algorithm attempts to generate an
optimal triangle mesh for quickly rendering large-scale terrain maps. It starts with a single
square, which consists of two right triangles, covering the entire map. If more detail is needed,
a vertex is added to the center of the square, splitting the two right triangles into four. Based
on a view-dependent priority calculation, triangles are split recursively in the same fashion until
the desired level of detail is reached. ROAM implementations typically implement two main
functions that are called every frame, one called Update() to update the mesh and one called Draw
() or Render() to draw it.
To determine when more detail is needed in the mesh, each triangle is assigned a split priority.
A priority threshold is set, and any triangle that exceeds the threshold is split. A triangle's
priority is calculated by determining the amount of visible error it contains. Visible error is
essentially the actual amount of error in the triangle divided by the distance from the camera to
that triangle. For any triangle that can be culled from the rendering process (i.e. outside the
view frustum or facing away from the camera), the visible error should be 0 because the
triangle isn't visible.
When splitting triangles in a 3D mesh, care must be taken to ensure that no cracks or seams
appear in the mesh. The ROAM algorithm handles this by following one simple rule: except at
the edges of the map, only a square can be split. In this case a square is defined as two right
triangles of the same size sharing their longest edge. When you need to split a triangle whose
neighbor along the longest edge is not the same size, you must split its neighbor first. If the
neighbor isn't part of a square, then its neighbor must be split before that. This check continues
recursively until a square is reached or the edge of the map is reached, and all triangles along
the path are split as the recursion unwinds. The figure below illustrates how this works:
Attempting to split triangle T requires four new vertices to be added to the mesh. Those
vertices and the new edges created are highlighted in red and numbered in order. The first
vertex makes the bottom-right corner of the map into a square. The second vertex makes a
diagonal square in the bottom center. The third vertex makes triangle T part of a square. Last
but not least, the fourth vertex splits triangle T.
Because the mesh starts with two triangles and each is split recursively into two smaller
triangles, a binary tree is often used to store the ROAM mesh in memory. In this case, each
node in the tree contains a triangle at a certain level of detail, and each frame all the leaf nodes
in the tree are rendered. Along with containing pointers to its vertices, the triangle object
usually contains pointers to its neighbors and members dealing with its split priority.
To keep the mesh optimal as the camera moves around, we need to be able to remove triangles
that are no longer needed. The rule for merging triangles back into their parent triangles is
that, except at the edges of the map, only a diamond can be merged. A diamond is defined as
four right triangles with their 90-degree angles sharing the same vertex, which is equates to a
square that has just been split. To merge the diamond, just remove the center vertex and turn
it back into a square. Unlike splitting a triangle through recursion, you must wait until a triangle
becomes part of a diamond before you can merge it.
In addition, it's not very efficient to figure out if a particular triangle is part of a diamond and
then figure out if that diamond needs to be merged. The best way to handle this seems to be to
maintain a list of current diamonds at all times. Every split creates 1 diamond and destroys 0-2
diamonds, and every merge does the opposite. The diamond structure should contain pointers
to its triangles and members dealing with its merge priority. Every frame merge priorities get
checked, and all diamonds that fall below the split threshold get merged.
One of the best things about ROAM is the amount of control you have over the performance vs.
visual quality trade-off. You can control it by fine-tuning the priority calculations, changing the
priority threshold, setting limits on the frame rate, setting limits on the triangle count, or
setting limits on the number of split/merge operations to perform per frame.
Probably the largest problem with ROAM is that mesh changes from one frame to the next
might cause a visible "pop" in the terrain. This artifact can be reduced by a process called
"geomorphing", where you gradually move vertices from the old position to the new position
over a number of frames, smoothing the pop and making the transition less noticeable. You can
also slowly change the vertex normal, color, and texture coordinates if necessary.
Because the traditional ROAM algorithm only allows you to split a square, you have to start with
a solid 3D shape that consists only of squares. To prevent cracks from forming in the mesh, the
edges of these squares must fit together seamlessly in all directions. Each square in this shape
will be treated like a top-level square, and each triangle like the root of a separate binary tree.
Neighbor pointers have to be set up between these top-level triangles properly so that splitting
at the edge of one top-level square can split the corresponding triangle on the adjacent edge.
The obvious 3D shape that consists of all squares with edges wrapping seamlessly in all
directions is a cube. Because a subdivided cube doesn't give a well-proportioned sphere, I spent
a lot of time trying to come up with a more appropriate shape, but I couldn't find anything that
fit the requirements. Later on, I realized that I was wasting my time because the split priorities
would quickly take care of the sphere's proportions. If there is too much error in one triangle
because its longest edge is out of proportion, it will be split.
Once you set up the 12 root triangles of the cube with the proper neighbor pointers, everything
falls into place and the traditional ROAM algorithm works beautifully. In fact, it's even easier to
implement than the flat ROAM algorithm because you don't have to test for conditions at the
edges of the map. The only thing you need to treat differently is how you implement your
height values. Instead of simply adjusting a vertex's y coordinate, you need to treat your height
value as an offset from the sphere's radius. Mathematically, treat each new vertex as a
direction vector from the center of the sphere. Normalize the vector and multiply it by the
sphere's radius plus the height offset.
One other simplification I made to the ROAM algorithm was to use a linked list of triangles
instead of a binary tree. The tree structure takes up more memory, slows the routine down,
and is more complicated to implement. A linked list is sufficient as long as each triangle keeps a
pointer to its edge neighbors and the triangle it was split from (its parent). The list is initialized
with the 12 top-level triangles, which have their edge pointers set up properly and their parent
pointers set to NULL. When you split a square, add two new triangles to the list with their
parent pointers set to the two existing triangles, then add a vertex and rearrange the vertices
and edge pointers of the four triangles. Do the reverse when merging a diamond.
To simplify the split and merge operations, it is a good idea to maintain the same order of the
vertices and neighbor pointers in the triangle object. I chose to order my vertices (0, 1, 2)
counter-clockwise with vertex 1 always being opposite the longest edge. My edge neighbor
pointers are ordered based on the edge shared, 0-1, 1-2, and 2-0 respectively. Since vertex 1
is opposite the longest edge, this means that the last edge pointer (i.e. array index 2) always
shares the longest edge.
Note from Figure 3 that it's not that hard to determine whether a triangle is part of a square or
diamond using edge pointers. If triangle1->edge[2]->edge[2] = triangle1, then triangle1 is
part of a square. Otherwise, you need to split triangle1->edge[2] to make triangle1 part of a
square. If triangle1->edge[0]->edge[0]->edge[0]->edge[0] = triangle1 (you can also replace
those 0's with 1's), then the triangle is part of a diamond.
Now that you have the basic idea behind my spherical ROAM algorithm, you need to know how
to optimize it for a dynamic planet generation algorithm like the one I explained in Part One.
The first key to making it fast enough for real-time rendering is to realize that the fractal
function you're calling is very slow. Even if you change it to use some other type of algorithm, it
is relatively safe to assume that it won't be a cheap function to call. Your only hope to get it to
run fast enough is to minimize the number of calls to that function.
Keep in mind that you have to update the priority of every triangle in the mesh each frame to
determine whether it needs to be split or not. That priority depends on the height of the new
vertex, which is determined by calling the fractal function. So to keep from having to call the
fractal function for every triangle every frame, you need cache the new height value in your
triangle object. So as each triangle is created from a split, you should call the fractal function to
get the height value for the next potential split.
Once I had that implemented, I realized I was still calling the fractal routine twice as often as
necessary. Because each triangle shares its longest edge with another triangle in a square, the
second triangle to be created for that square was making the same call to the fractal routine.
It's easy enough to determine whether a triangle is completing a square at the time of its
creation, and if so borrow the calculated height value from its neighbor. Also note that there is
also no need to call the fractal routine when you merge a diamond because the vertex you are
removing has the information you need. Just take the vertex's height value and use it to
compute the offset to put into the merged triangle objects.
Next I focused on the priority calculations. The first thing to keep in mind is that if the merge
priority is not calculated in the same way the split priority is calculated for a triangle's parent,
you quickly get into a situation where each frame you will split several triangles only to have
them merged together again right away. You can also get into a situation where unneeded
triangles are kept around long after they're needed. Because I'm not using a binary tree, I'm
not keeping track of the original state of the parent triangles. This was easy enough to solve by
adding a few members to my diamond object to keep track of the necessary information.
There are a number of other changes I can think of off the top of my head that could
theoretically speed it up even more. The simplest way is to come up with some logic to avoid
doing a priority check on every triangle every frame. At 50,000 triangles, this can be quite a
burden on the CPU. I recently read an interesting article that described an improved split
priority concept. In this article, the priority calculations were pre-calculated to determine the
distance at which a triangle would be split or a diamond merged, and updates were driven
mainly off of distance checks. The author then grouped polygons together into a tree using
bounding spheres to greatly reduce the number of distance checks that needed to be made
each frame. I'm not clear on the details surrounding their implementation, but they reported a
dramatic improvement in their Update() function.
Next I concentrated on the Draw() function, and switched from direct calls to glVertex, glNormal,
and glTexCoord to a vertex array. I created a dynamically sizing array class, and made it global
so that all ROAM objects use the same vertex array. This optimization gave a noticeable boost
in frame rate, which illustrated to me that my optimizations to the Update() function had paid off
and the bottleneck had moved over to the rendering pipeline. If the rest of the code had been
poorly optimized, I would've seen a very small improvement optimizing the rendering pipeline.
Then I looked at the Nvidia documentation to find the fastest way to push non-static triangles
to the video card. I found that because I had already implemented a vertex array, it took very
little effort to implement the GL_NV_vertex_array_range and GL_NV_fence extensions. These
extensions allow you to copy vertices directly to AGP or video memory, and when you render
with the standard glDrawElements function, it renders them extremely quickly. I believe the effect
is the same as using DirectX's vertex buffers stored in video memory.
Given all these optimizations, I get about 20 FPS rendering 50,000 triangles using a 750 MHz
Duron with a GeForce2 MX video card. When I toggle the mesh updates (i.e. skipping the Update
() call) on the same machine, it speeds up to 45 FPS with 50,000 triangles, so there's still room
for improvement in the update routine. The only further optimizations I can think of making to
the Draw() function would be to see if I could convert my triangle lists to a set of strips or fans,
but that would require even more changes to the Update() function to maintain the list of strips
and fans. Given the frame rates I'm currently getting, I can't imagine it would improve
performance much unless I came up with a simple way to maintain them.
I want to go over the classes I wrote for this ROAM algorithm so you can get a good idea of how
they work. To keep it concise, I will only list and explain key classes, members, and methods
here. This is just to explain some of the finer points of what I did to make it easier for you to go
through the demo source and modify it or come up with your own implementation based on it.
CROAMVertex:
class CVertex
{
public:
static CVertexArray Array;
CVector m_vPosition; // The vertex position
CVector m_vNormal; // The vertex normal
float m_fTexCoord[4]; // 2 sets of texture coordinates
};
The CVertex class doesn't have any ROAM-specific members or methods in it, it's just a regular
vertex with space for two sets of texture coordinates for multi-texturing. Note the static
CVertexArray object, which is currently used as a global OpenGL vertex array shared by all ROAM
instances. This needs to be global because I'm only allocating room for 65536 vertices in video
memory (Nvidia-specific), and I need to make sure one ROAM object doesn't step on another. If
necessary I could make a second global array, perhaps to share non-ROAM vertices, and it
would allocate its own block of video memory and work in a similar fashion.
CVertexArray
public:
// Init and Cleanup methods
void Init(); // Initializes extensions, allocates video memory
void Cleanup(); // Frees video memory
The CVertexArray class implements the OpenGL vertex array I use for my ROAM algorithm. To
properly encapsulate an OpenGL vertex array, it must handle the GL_NV_vertex_array_range and
GL_NV_fence extensions, and it must contain wrappers to call functions like glVertexPointer(),
glNormalPointer(), and glTexCoordPointer(). As you can see here, it keeps a stack of free array
indexes into the vertex array. When the ROAM split function needs to add a new vertex to its
mesh, it calls GetElement() to get a new index into the array. When the ROAM merge function
needs to remove a vertex from its mesh, it calls ReleaseElement() to put that index back on the
free stack.
CROAMTriangle
This class requires a little more explanation. As I mentioned above, my ROAM implementation
uses a linked list of triangles instead of a binary tree. The list is initialized with the 12 top-level
triangles, each with its parent pointer set to NULL. When a square is split I create two new
triangles and set their parent pointers to the two existing triangles in the square, then I
rearrange the vertices and edges of all four triangles. This means that the true parent triangle
doesn't really exist in the list anymore. But as long as I have a pointer linking a triangle with
the triangle it was split from, rebuilding the parent is easy when it needs to be merged.
Every time I create a new triangle during a split operation, I cache the midpoint and length of
its longest edge (the edge to be split). I also cache the offset from that midpoint to the vertex's
correct location (i.e. the error). If the new triangle is completing a square, then the values are
copied from its neighbor. Otherwise, the values are calculated. All three values are used in the
priority calculation, which also takes into account the camera position and heading and the
distance to the horizon.
Initially the GetPriority() method returned 0 if the distance to the vertex was greater than the
horizon or it was out of the current angle of view, but this caused problems. Essentially, if you
turned the camera away from a planet while close to it, all polygons would be merged and the
sphere would be turned into a cube. When you turned the camera back to face the planet, the
cube would not always be split because the vertices checked were so far away from the camera
that they were either beyond the horizon or outside the angle of view. So now a priority is
always calculated if the camera is closer to the vertex than the length of the edge to be split.
The Draw() method simply fills in an array of indices for a future call to glDrawElements(). I've toyed
with adding a check to skip drawing triangles that aren't in the current view, but I've got that
commented out right now so I can turn off updates from a specific viewpoint, then fly around
the planet in wire-frame mode to see how well the mesh is optimized. You may get a small
improvement in performance by un-commenting the lines dealing with the TriangleDrawMask flag.
CROAMDiamond
This class is a good bit simpler than CROAMTriangle, but it is just as important. A diamond is
made up of four triangles. Because two of them are always parents of the other two, the parent
and child pointers are kept separately to make the merge routine simpler. Note how the priority
members stored in CROAMTriangle are duplicated in this class. These values are what would've
been cached in the original parent triangle that gets destroyed during a split. Caching them in
CROAMDiamond allows us to calculate a merge priority that matches the parent's split priority
exactly.
Note that the diamonds are in a linked list, each contains a pointer to its 4 triangles, and a
triangle contains a pointer to the diamond its currently in. This is pretty redundant, but it is
necessary to get good performance. When splitting a triangle, you need to be able to find and
destroy the diamond it's in (if any) very quickly. I used to scan the list of diamonds, thinking
that the list would stay rather small, but this killed my performance when a lot of splits needed
to be done each frame.
CROAMSphere
We finally get to the main ROAM class. The key member variables should be about what you
expected at this point: a triangle list, a diamond list, and an algorithm object for retrieving the
height and texture coordinates of new vertices. The Init() method takes an initialized algorithm
object, and builds the 12 top-level triangles of the cube. The CollisionCheck() method calls the
fractal function to determine whether the camera's position is inside the planet or not. This is a
faster and more accurate way to do collision detection than to check against the current mesh.
The Split() method looks messy right now and could probably be cleaned up a bit, but it is fairly
easy to explain with some pseudo-code:
pVertex = CVertex::Array.GetNewElement()
Initialize pVertex's position, normal, and texture coordinates
Create a new triangle and add it to the triangle list
Split pTriangle into the new triangle
Create a new triangle and add it to the triangle list
Split pOpposite into the new triangle
The Merge() method is almost as messy right now, but it is also fairly easy to explain with some
pseudo-code:
The Update() method takes a camera position and an error threshold from the game engine. It
loops through all diamonds and triangles, checking their priority and merging or splitting when
they cross the error threshold. Recently, I added a wait state for triangles and diamonds so that
if they're not close to the threshold, they won't be checked every frame. This provided a decent
performance improvement, but I'm sure there's a better way to do this.
Originally, the Draw() method did nothing but render the polygons in the mesh. Then I switched
to an OpenGL vertex array, building an array of indices to call glDrawElements(). The first time I
tried this, I ran into problems trying to draw more than 20,000 triangles. I had read that you
couldn't have more than 64K vertices in your array, but I wasn't anywhere close. I figured out
that most video cards also won't allow you to pass any more than 64K indices to glDrawElements
(). So I changed my Draw() function to build an array of indices separately, and wrote a loop to
send them to glDrawElements() in blocks of 60,000.
Final Notes
Now that you've seen how to generate and render a full-size planet at real-time speed with
dynamic level of detail, I encourage you to play around with the source code and see what you
can come up with. It should run pretty well on most systems with OpenGL-accelerated drivers,
and it should fly on NVidia GeForce cards. Because it only generates and uses memory for
vertices the camera is close to, it should be fairly easy to expand this demo to generate and
render an entire solar system, or an entire galaxy of star systems, without a significant impact
on memory or the CPU. The next article in the series will focus on doing that and will explain
how to fix some of the problems you will run into working with such a large game world.
http://www.gamasutra.com/features/20010810/oneil_01.htm
Three-Axis Animation:
The Hardships of Animating Three-Dimensional
Characters in Real Time Games
By JM Ringuet
Gamasutra
July 27 , 2001
URL: http://www.gamasutra.com/features/20010727/ringuet_01.htm
Introduction
3D game character animation is a relatively new craft where three dimensions in the game-
world was not possible, even a few years back. Four years ago we would not have dreamed of
animating facial expressions in real time, and now the latest line of video cards and consoles
brings even more complexity to what we can do. I think it is the right time to start reflecting on
3D game animation as a craft by defining the methods we are using, establish some rules, find
tricks, and understand what it takes to be a game animator. It is time to go from tinkering to
crafting.
This article deals with the real time in-game character animation. I explain and discuss the very
basic rules of what I call three-axis animation. However, it is also useful to many people who
are involved with animation. I present the essence of what I have learned during the past few
years. Every animator has a different approach to the same problems, but I feel my experience
could be helpful to some of you, especially if you are totally new to in-game animation.
You probably remember the little sixteen by sixteen blocks of pixels creating sprites that were
supposed to look and act like characters with personality in a 2D game environment. Moving
pixels around in 2D animation involves creating a blurry illusion of something, while animating
in 3D is close to reproducing an actual motion with all its cold mathematical strangeness:
acceleration, deceleration, forces, dynamics, weights and curves. Going 3D is a huge and
frightening step forward. For a 2D animator, it's a strange new world. I started, like everybody
else, using the 2D method of defining a few extreme key frames from a side view and hoping it
would work. Unfortunately, it did not work, mainly because 2D is an illusion in an impossible flat
world and is accepted by the player (your average Joe player) as an illusion. In contrast, 3D
implies a world that obeys the same laws as ours, in a world that is truly believable. Sure Mario
can hop and jump without even moving his legs in a flat universe, but in a world with
dimensions and space, everybody expects him to act and react according to physical laws. The
difference between 2D and 3D is not an added dimension, but a heightened expectation from
the player. That's the big deal!
Now, as animators, we can ignore this fact, and still try to fix some little animations, or we can
rethink our ways of animating from the ground up. That means leaving behind everything we
learned from 2D, and understanding exactly what we are doing.
Gravity is the key
Let me define the technical basics of what 3D animation exactly is. As I said earlier, in a
believable universe we have to consider laws of nature. The main law that directly affects any
movement in a believable universe is gravity. Every motion in a 3D space either is created by
gravity or is a reaction to gravity. For example, you are probably sitting while reading this
article. Try to stand up and visualize what keeps you standing. Mostly, it is the muscles in your
legs (especially the thighs), and possibly the muscles in your arms if you are using the table to
lift you up, that produce a force strong enough to counterbalance the effect of gravity. The
creation of a force is the basis for motion in a 3Dspace.
However dry this topic may sound, bear with me, because understanding the basics often
makes the difference between good and bad animations. If you keep in mind that your
character has to create a force to counterbalance the effects of gravity every time you want to
move it, you will avoid many of the common mistakes.
What does it really mean? A quick example can illustrate my point. If you want to have a
standing character going from point A to point B, you have two ways of solving the problem:
z You can select the mesh of the character, or his center of gravity, and move it from A to
B. Here, you are creating an artificial force that allows you to move the body.
z You can create the motions (the forces) that will move this center of gravity from A to B.
This is done by replicating the way a body moves in the real world.
In the first solution, you are creating the illusion of movement. The second solution means you
are understanding and reproducing the way physical laws create a movement. Some people will
disagree with this because it makes inverse kinematics invalid. The truth is, inverse kinematics
is a bit of a monster. Making a character walk by moving its feet is wrong; it is the total
opposite of a natural walk to imply that the feet are moving because the legs are moving.
Rather, it is simulating a motion, instead of reproducing a motion, and trying to create an effect
without knowing what forces produce it. You should be only interested in knowing what make
things move, because if you understand what you do, you can control it.
Gravity, and how it is used to create movement, is the first rule of three-axis animation. The
two other rules are balance, tilt and twist. Let me explain to you how those rules are tied
together and can be used to create a complex animation.
Out of balance
I already said that every motion is related to gravity and, in a way, created by it. Actually, it is
known as being in balance or out of balance. Almost every move starts by putting the body out
of balance and ends by regaining balance before falling down. To explain this, let me try to
define two very important concepts: the center of gravity, and the idea of balance.
The center of gravity is a point situated between the hips (higher or lower depending of the
body mass). This center of gravity is usually the first bone or root point of a character skeleton.
To move the body, the limbs have to move the position of the center of gravity.
A body is in balance when you can trace a vertical line from the pit of the neck (upper torso),
the ankle of the supporting foot (if the character is supporting his weight on one leg), or to a
line that joins the two ankles (if the character is supporting his weight equally with his two
legs). If this line is not perfectly vertical, the body is out of balance. You will notice that the
center of gravity can be positioned anywhere in relation to that line. The only important points
are the pit of the neck and the ankles. The amount of unbalance is relative to the angle of the
line. The body is more likely to fall if the angle between the body and the ground is increased.
A body is in balance when you can trace a vertical line
from the pit of the neck, the ankle of the supporting foot,
or to a line that joins the two ankles.
The human walk is a great example to illustrate these two concepts and show how they are
related.When starting from a standing position, (assuming our subject is right handed) the first
movement of the walk is the forward rotation of the right leg around the hipbone, lifting the
right foot up in front of the body. At the same time, the left leg rotates around the knee (the
thigh is almost locked in place, the calf muscles produce all the force), and the left foot follows
by rotating around the ankle. The body pushes forward as a result of the left leg rotation, and
since the right foot is not in contact with the ground at that moment, the center of gravity is
displaced. The whole body moves forward. This forward movement suddenly stops when the
center of gravity passes in front of the supporting left leg (the balance line is not vertical
anymore). The body becomes totally out of balance and falls forward. This is when the right leg
makes contact with the ground, and becomes the supporting leg and stops the fall. The body
continues its forward motion. The whole movement repeats, however, this time the right leg is
producing the forward thrust and the left leg is lifting to catch the fall. A walk is a succession of
pushes and near falls, putting the body alternatively out of balance and in balance. We are
using the effects of gravity to move our body mass on a linear path, with our legs
simultaneously producing the thrust force and support. A walk is a succession of controlled falls.
As you can see, trying to animate a walk without knowing that you have to move the center of
gravity by putting the body out of balance is very difficult. If you do not know why each part of
the body moves, you cannot understand how they move.
The good news is that there is an easy rule to remember: every movement of the body is based
on a thrust (from the calves, thighs, arms, etc.), moving the center of gravity out of balance,
and then a catch by a leg or an arm, putting the body back in balance.
The amount of thrust and the time between the thrust and the catch determine the amplitude
of the movement. Of course the more nimble the character is, the more extreme his
movements will be. Moving a body means playing with gravity and playing with balance. If you
remember that rule before starting any animation, you will have more control on what you can
create.
Now, there is obviously more complexity in a motion than just using some muscles thrusting a
body forward in space and trying to avoid a fall. Why is movement so complex? For the very
simple reason that the body of every vertebrate is rigid. That can sound strange if you consider
all the stretching and moving muscles that comprise most of our body, but the underlying
structure is a rigid skeleton made of solid bone. Even an insect has a solid exoskeleton that
makes its body rigid. Gravity, the most basic law of nature, has created the need for a rigid
structure like the human skeleton.
Every movement we make is a rotation of several rigid bones around articulations. As far as
game animation is concerned, there is no flexibility whatsoever in the skeleton. This is why you
can animate a stick figure and still create a valid animation. Muscles are built over the skeleton,
and their only function is to create the rotation of the rigid parts. Muscles create the force.
Evolving big muscles has always been difficult for any living being, because it is a complex and
very expensive piece of body hardware. Muscle also consumes a lot of energy to function
properly. This is why our bodies have just enough muscle to allow us to move around. A simple
increase in weight slows us tremendously. Animals have a rigid skeleton, and because they do
not have super powerful muscles, they had to come up with strategies, called tilt and twist, to
allow them to move as fast as possible.
To illustrate this, let me go back to the human walk example. We have seen that the first
movement of the walk is the forward rotation of the right leg around the hipbone, lifting the
right foot up in front of the body. Now if you stand up and try doing this by only using your
thigh muscles to move the leg forward, you will quickly realize one thing, your foot is sliding on
the ground. There is nothing wrong with that, except that sliding a foot actually requires more
effort than lifting it up. You will also realize you are almost out of balance, and it is difficult to
stand straight.
So how can you actually lift your leg? You can do it only by tilting your hipbone to the left. Try
again to move your leg by tilting the hips and you will understand that this is what you do
naturally. The tilting of the hipbone has two effects. First, it makes your right leg higher than
the left one, thus allowing rotation forward without having your foot scraping the ground.
Second, it moves your center of gravity over your left leg (the vertical line drawn from the pit of
the neck to the ground is moving from between your feet to your left foot). With the center of
gravity over your left leg, you can lift your right leg and stay in balance. With one easy tilt, we
create a solution to two problems. Of course, the balance is only on a lateral plane; we want to
move forward. Therefore, the necessary unbalance I discussed earlier has to be created forward
and not sideways.
Tilting your hipbone to the left makes your right leg higher
than the left one, thus allowing rotation forward without
having your foot scraping the ground.
Unfortunately, the tilting of the hips produces a different unbalance between the upper and
lower parts of the body. Now that one leg is higher than the other, and the body weight is
moving to the left, we have to compensate that motion with the top of our body. This is why we
have to tilt our shoulders in the opposite direction of the hips with the right shoulder will be
higher than the left. Because this unbalance is minor, the tilt of our shoulders is at a smaller
angle than the tilt of our hips. To stay level with the ground the head is also tilted slightly in the
opposite direction (same direction as the hips), as a side effect of the shoulders tilting.
As you can see, this very simple first step in the walk suddenly puts in motion not only the legs,
but also the hips, shoulders, spine, and the head. What may appear very simple in a 3D
program (just rotate one object around one pivot point), actually requires rotations on many
other parts of the skeleton. By understanding that, you start to understand the underlying
principle of three-axis animation: every part of the body has to move in three dimensions to
create a realistic movement.
Tilting is a very effective and cost efficient way (in term of energy) to quickly rotate the parts of
a skeleton, but living in a three-dimensional universe allows us to improve this efficiency with
twisting. We have seen how the process of lifting our right leg for this crucial first step involves
a rotation forward (Z-axis rotation) and the tilting of the hips (X-axis rotation). What about the
Y-axis rotation?
When we move our right leg forward, we can easily improve the reach of the first step by
simply rotating the hips on the Y-axis, twisting the lower part of the body to move the right hip
forward. This is a very simple and small motion, but being at the top of the rotating leg, it gives
a significant increase in reach. As in tilting, the same unbalance occurs with the top of the body.
We have to twist the shoulders in the opposite direction of the hips to keep the body in balance.
The head will also have to twist a little in the same direction as the hips.
Tilting and Twisting are the last rule you have to understand to create any kind of animation. It
means that every part of the body involved in creating a motion has to rotate on the three axes
(X, Y, Z) at the same time. Because of gravity and the need to stay in balance, we vertebrates,
have evolved this way of moving by rotating, tilting and twisting our rigid skeleton in the most
efficient manner. Evolution plays a huge role in our movement. Centuries of trial and error have
shaped muscles and bones to allow them freedom of movement and efficient management of
energy. Learn this and understand it. Animations should not be random, because there is an
underlying order and explanation for every part of a movement.
Creating
One of the best features in the rules of three-axis animation is their universality. Every
vertebrate moves according to those rules, even the ones without legs (like snakes).
For example, as a dog runs, it moves its center of gravity forward by pushing on its back legs
(where its most powerful muscles are), slightly tilting the hips and shoulders but twisting them
to cover more distance. This exaggerated twisting explains why the front paws of a dog hit the
ground one after another instead of at the same time.
A monkey jumping from branch to branch observes the same rules. Obviously, it uses only its
arms to move, but its arm muscles are not powerful enough to move fast, so it has to use a
combination of extreme tilting and twisting in the shoulders to cover more distance with
minimum effort.
How do I know that? I didn't spend hours observing dogs or chimps in their natural habitat. I
simply tried to understand why they move in that way and what was the most efficient way of
moving. The three axis animation principles are guidelines to analyze any kind of motion.
Knowing that you can replicate about any movement by understanding it, gives you the
freedom to invent new ones. The true challenge for an animator is obviously not to do a run
cycle, but to come up with that crazy out of this world movement that your producer needs for
his new fighting game. Games are not always realistic (not realistic as in cartoony), and you
often need to create something bigger than life (like in any good kung fu movie), so you will
have to invent a lot of moves. Inventing is challenging, but if you use the basic rules I talked
about, you can come up with something as fantastic as you can imagine and at the same time
keep it totally believable.
Rules are better than tricks when you animate. Rules can be applied to any kind of animation.
Rules give you the basis to elaborate, create and give style to your work. Try to keep them in
mind, and your life as a game animator will be a lot easier.
Now we have all the tools necessary to create the right animation for our character, but we
have one last obstacle: remembering it's a game. Animating for a game requires restrictions.
From the number of polygons the character is made of, the ridiculously limited number of bones
you can animate, to the numbers of key frames you are allowed for each movement.
Restrictions are the ugly reality of game animation. If I tell you to animate a guy hitting the
perfect hook on the jaw of his opponent, you will probably have no difficulty figuring it out. Now
if I you to animate it in twelve frames (almost a third of a second), you start to realize what the
real challenge is. There is no easy way to deal with restrictions in games, however, a few
guidelines are useful. If you keep in mind the basic rules of three-axis animation, you can easily
tweak, adapt, and simplify your animations without changing them.
The most important thing to keep in mind is how your animations will be seen, and not how
they look. Seeing an animation in a 3D program is very different from seeing it in a game. This
may seem obvious, but it is too often forgotten. If you are working on a fighting game for
example, the camera will be very close and probably always on the side of your character. If
you are working on a first person shooter, chances are the only thing the player will really see,
are the death animations of his enemies. If you are working on a strategy title, the overhead
camera will flatten and change every animation. This means you have to look at all of your
animations from the perspective of the player. Try to place the camera at the same angle as the
game camera. Do not play your animations in slow motion or frame by frame. Think of it as a
whole rather than getting lost in useless details and minute tweaks that nobody but you can
see. Game animation is an illusion and a craft, not a science.
The first thing about an animation is making it right. The second thing to remember is to make
it interesting. This is why you have to know what part of your animation will be seen in the
game. Put the subtle details in place, the key frames and the nicely crafted weight effects
where you are sure they will be seen. Remember, if the player can't see them, they do not
exist, and all of your work has been a complete waste of time
Try to make your animations simple and expressive. Making it right, does not mean you have to
make it complex. One thing the basic rules of movement tell us, is that every motion has to be
cost efficient. Try to do the same for each animation. Do not create key frames all over the
place, because you will have to correct every little glitch. Keeping it clean and simple will make
it easier to manage.
Finally, basic rules give you basic animations. Remember, that the rules of three-axis
animations are only a foundation to create, experiment, and discover. A memorable animation
is one that adds to the character. You have to understand the how in animation, to be able to
create the why. For instance, this guy is walking by thrusting his legs, and tilting and twisting
his hips and shoulders, but why does he want to walk? What do I want to communicate with my
animation? Just try to communicate one simple feeling. The guy is a guard who is alone in a
room, and is probably bored and tired. Try to express the guys feeling of boredom through your
animation. The player will remember he saw a person who shuffled around and looked
believably bored, and not just a mindless robot. This will give the player a little more
information about the game world, and more enjoyment.
Rules are useful, and necessary, but they can become a distraction when the only thing you
want is the right animation. Do not get obsessed with what is necessary to make it perfect. You
should thrive for the good animation. An animation imbued with emotion that gives life to a
mathematical object made of polygons.
Conclusion
The rules of three-axis animation are a simple way of understanding how a body moves, fights
and uses gravity to stay in balance, tilting and twisting its limbs to achieve the greatest possible
range with ease with each moving part of a skeleton rotating on the three axes at the same
time to help create the motion. These rules are a base that allows greater control and freedom
to create. Try to learn them, understand them, and then forget them.
If you want more information, you can read the following books:
http://www.gamasutra.com/features/20010727/ringuet_01.htm
It is common for games to offer a 'replay' feature. This feature allows the player to record a
sequence of game play and then watch it over again, perhaps from a different viewpoint, or in
slow motion. The player may be able to save the recording to disk or memory card, or even
transmit it to other players.
When faced with the task of implementing this feature, two different approaches become
quickly apparent. The first solution is to store absolute information about all of the objects in
the game world (including the player) on a frame by frame basis, or at fixed frequency. This
would include data such as position, orientation, and so on. The replayed sequence is then
constructed by streaming this information back into the game engine, and interpolating where
necessary.
A second and much more elegant approach is to make use of the inherent predictability of
computer software: The same sequence of operations performed on the same set of data will
always produce the same result. It is reasonable to conclude that a sequence of game play may
be precisely reproduced by recording only the initial state of the game, along with the player's
inputs. The initial state can then be restored, and the recorded inputs reapplied, to produce the
same sequence of play. This solution is instantly more appealing because the amount of data
that needs to be stored is much smaller. It is also simpler to implement and maintain, as all
necessary coding can take place at the 'player input' level, and remain independent of the
underlying game engine.
If you've ever tried to implement a replay feature using this method, then you will know that
life is not quite so simple. Despite the fact that the same program is running, with same inputs,
things just don't happen exactly the same way the second time around. This is because game
software does not run in isolation. It makes extensive use of externally generated run-time data
which is somewhat less predictable, and undermines the natural reproducibility of the game
engine itself. Just one small difference in this external data will cause the replayed sequence to
diverge from the original, potentially resulting in completely different game events. This is often
an insurmountable problem. Game software is highly complex, and the task of isolating and
dealing with externally generated data can be overwhelming.
However, if external data is isolated at the start of development it can be done effectively and
with minimal effort. Moreover, the ability to precisely reproduce a sequence of game play in this
way offers many other benefits to development beyond implementing a replay feature. If you
have reproducibility, then you have a way of reproducing even the most obscure bug discovered
by your testers. More significantly, you also have the foundation of a low bandwidth networking
solution
This article is about building such a game engine. An engine which can record a player's inputs,
reapply to them to the same initial state, and precisely reproduce minutes or even hours of
game play just as reliably as a video recording. Reproducibility effects most of the components
of a game engine, and is characterized by behavioral criteria rather than particular algorithms.
For this reason, the focus of this article is to present ideas and design techniques which can be
integrated into any game engine, rather than specific code. Most of these ideas have evolved
over time through discussions with other experienced game programmers, and proved
beneficial to many projects in different ways, so I will discuss how reproducibility can be used to
implement other game features, such as networking. I will also highlight the main
implementation problems that both I and others have encountered, along with practical
solutions.
The Benefits
Maybe you are not convinced yet. Maybe you don't think that you need or even really want
reproducible behavior in your software. So before continuing I shall review the benefits that
reproducibility has to offer.
1. Action Replays. If you spent more time watching the action replays in Gran Turismo than
actually playing the game, then you will need no more convincing. This game feature
represents exactly what we are trying to achieve.
2. Debugging. Being able to reproduce bugs quickly and reliably offers big time savings to
programmers. Almost every game programmer has at sometime spent hours or even
days just trying to reproduce an obscure bug. Being able to automatically reproduce these
kind of bugs saves time, your sanity, and your company's money.
An input is a datum passed to the game engine at run time from an external source, which is
used in some way to modify internal data. An obvious example is a player's input. Conversely,
outputs are data which are generated by the engine and passed to an external target, but which
are not used internally by the game. An example output might be vertex data passed to the
GPU for rendering. All remaining persistent internal data we shall term as "game state".
Examples of a game state data are the speed of a car, or the player's position in the world.
Having defined these terms, we can set about identifying them in a game engine. Figure 1.1
represents a simple input/output/state arrangement for a typical game engine.
If this strategy is successful then during play the game state will undergo numerous state
changes, progressing from state to another through the passage of time. When we view a
replay of this sequence, the game will start in the same initial state and progress through
exactly the same transitions (with the same player inputs reapplied), passing through exactly
the same sequence of states, and arriving at exactly the same final state. Let's be clear about
what this means: at the end of the replay, every single item of data in the game state will finish
in exactly the same state as it did when the game sequence was originally played. We cannot
afford for even one thing to be different at any point in the replay, or else it will diverge from
the original. But so long as we isolate all of our inputs, then this state transition will occur
correctly, and predictably, every time that the sequence is replayed.
It should be noted that outputs are of no concern to reproducibility. They neither effect the
game state, nor the inputs. Once they have been identified we need not consider them any
further. However, there are some traps to look out for. For instance, rendering data may
actually used by other parts of the game, such as the collision detection system. If this is the
case, then this rendering data must be considered to be part of the game state.
All that remains, then, is to deal with each of the game inputs in turn. The first input listed in
figure 1.1 is that of the player, and we already have a have a good idea of what we are going to
do with that. The next item in the list that requires our attention is 'time'.
Time as an Input
Most games run in real time. The animation produced by the game engine is scaled to readings
made from the hardware system timer. Typically, each rendered frame takes a different amount
of time to render, and so to maintain smooth, frame-rate independent animation the game
engine is parameterised on these timer readings.
A typical way of implementing frame rate independence is to update the game state once for
each rendered frame. At the start of each update the system timer is read, and the time that
has elapsed since the last update is calculated. This "last frame time" is used to calculate the
next game state, and then the next frame is then rendered, and so on. For example, if a car is
moving with some velocity 'V' is the game world, and its position at the last rendered frame is
'P1', then its position for the next frame, P2, may be calculated using the simple integration:
P2 = P1 + V * last_frame_time
You may not have thought of time as being an input into your game engine before, but looking
at the above equation you can see that the result of each game state transition is determined
by the last measured frame time. The sequence of game states that the engine passes through
is directly dependent on time readings from the system timer. Even though the animation is
independent of frame rate, this dependency will undermine the reproducibility of the game
engine because we cannot rely on getting the same frame times during the replay.
For example, imagine that our engine renders two frames, and that the frame-time for each
update is 30 milliseconds. This will result in two state changes, each representing a time update
of 30ms. Now imagine that we wish to replay this sequence, as seen from a different viewpoint
in the world. From this new viewpoint, we can see fewer world objects, and consequently our
engine renders at a higher frame rate. Perhaps a single frame takes only 20ms to render.
Instead of generating 2 frames of 30ms, our frame rate independent engine now generates 3
frames of 20ms. The result of this is that our car's position is now slightly different. Even
though it has been moving for exactly the same amount of time, inaccuracies in the integration
of its velocity mean that its position is marginally different. In the replay, the car is not passing
through the same set of states as it did the original sequence. Aside from this, we also have a
problem reapplying the player inputs. If the inputs are read once per frame, then we cannot
apply them at exactly the same time in the replay as we did when they were originally applied.
Dealing with system time readings is thus critical to reproducibility. Even when running a replay
on the same computer that it was generated, there is no guarantee that the same timer
readings will be generated. Even making a small change to the view point will result in different
readings, and therefore a different sequence of game states. This problem is exacerbated when
running a PC based game, because the OS may be executing other external processes, or the
player may, perhaps, decide to upgrade his or her graphics card to achieve a higher frame rate!
Unfortunately, timer readings cannot be eliminated from the game, or even made to be
predictable, if we are to retain frame rate independent animation. However, the game state
may easily be de-coupled from the system time. Perhaps the simplest way to do this is to
quantise the state execution time, by updating the game state at a fixed frequency. Game
updates can then be controlled with a small time control loop. This loop reads the system time,
and decides how many fixed-time state updates to execute. When the required number of
updates have been applied, the frame is rendered. This of course means that several state
updates may occur to produce a single rendered frame of animation.
The C-like pseudo code for such a system is shown below. In this case, the game state is
updated at a frequency of 100Hz (10 millisecond increments).
//see how much time we need to execute for the next rendered //frame : read
system timer & find elapsed time
execution_time += LastFrameTime();
while(execution_time > update_time)
{
execution_time -= update_time;
num_updates++;
}
RenderFrame();
Notice that any unexecuted time remaining in execution_time is carried over to the next frame
update. This is important, because when the game is achieving a high frame rate, no time is
'lost' and frame rate independence is maintained. This schema also gives us a perfect way to
record and reapply player inputs. The inputs are read once per state update, and are recorded
against the total number of state updates that have been executed at that point. This means
that they can be reapplied against exactly the same game state that they were originally
applied.
Reproducing a sequence of game play generated in this way thus becomes a simple task.
Starting from the same initial state we can execute the replay using the same process of
updating the game state at a fixed frequency. Stored inputs can be reapplied by checking the
number of updates that have been executed, to see if they match the recorded number of
updates of the next input in the list.
even if the frame rate is different during in the replay, the game state will pass through exactly
the same sequence of states, and that the player inputs will applied at exactly the same time
(and to the same game state) that they were originally. The code needed to execute a replay is
shown below.
//see how much time we need to execute for the next rendered //frame, as before
execution_time += LastFrameTime();
while(execution_time> update_time)
{
execution_time -= update_time;
num_updates++;
}
RenderFrame();
Thus, the replayed sequence will always generate identical game behavior, even if the
viewpoint is changed, or the replay is run on a different hardware configuration. It is important
to note that all components of the game which modify the game state should be parameterised
by the 'game time' only. Under no circumstances should a programmer be tempted to read the
system time directly, as to do so and use it to change the game state will result in non-
reproducible behavior
In order to reproduce a sequence of game play it is necessary to ensure that the replayed
sequence starts in the same initial state as the original. It is almost certain that we will want to
record sequences mid-game, and so it necessary to be able to save and reload the game state.
In practice this is not difficult to achieve, and it is probable that we will require a load and save
game feature anyway. However, it is important to ensure that the loaded game state is identical
to the original. If it is not, the subsequent behavior may not identically match that of the
original. We have now progressed a good way towards a generic design for a reproducible game
engine. However there are still some remaining points to be dealt with.
Other External Data Sources
The input list in the beginning of this feature includes the somewhat ambiguous item 'External
Data', which essentially means "any other source of externally generated run-time data", and
may include hardware or operating system sources, or software libraries used by the game.
These are specific to each individual game, (for example, some games make use of software
libraries licensed written by other developers) and so need to be considered on a case by case
basis. However, the important thing is that they do not damage the reproducibility of the
engine. Any external data source used by the game needs to be reproducible. It is also
important that it can be restored by the game when the game state is loaded.
Generally there actually few such sources, but one which is used in most games is the C/C++
random number generator provided by the standard library. This random number generator is
in fact reproducible. It will always generate the same sequence given the same seed. However
there is a problem with loading and saving the game state.
When a game is reloaded it is clearly necessary to restore the random number generator so
that it will generate the same sequence from that point as it did after the game was saved. A
tempting solution would be to re-seed the random number generator whenever a save occurs,
and save the seed as part of the saved game state. However, this is not a good approach,
because it means that the act of saving the game will effectively change the game state, and
we wish to be able to save and reload the game freely.
The easiest way to circumnavigate this problem is to avoid using the standard library random
number function altogether, and instead use our own. Fortunately there are many pseudo-
random number algorithms in existence, which can be coded and included in the game engine;
it is then only necessary to save the generator's data as part of the game state.
In some cases, the use of floating point math can also cause problems with reproducibility. The
problem arises where we wish to reproduce a sequence on a different platform from the one on
which it was generated; or at least, a platform with a different hardware floating point unit. On
a fixed hardware platform, such as console, this presents no problems. However if we are
developing a PC based game then we must be aware that a PC is not a fixed-platform.
Different PC systems utilize different CPUs, and different CPUs have different FPUs. It may seem
surprising, but different FPUs can produce marginally different results for some calculations,
even though all units are IEEE compliant. This can be attributed to different internal
representations within the FPU. In addition, different software builds can exhibit different
behavior on the same hardware due to changes in the storage of floating point values. This
means, for example, that a sequence replayed through a 'debug' build of a game may exhibit
different behavior when played through a 'release' build.
In any case, this leaves us with a problem if we wish to reproduce a game sequence on
different PC to the one on which it was recorded. In this respect we wish must consider the
results of PC based floating point calculations to be an external data source. The obvious
solution to this is to use integer (fixed point) data to represent our game state. This may well
be desirable for other reasons, for example if we are planning to port code to another platform
which does not have a hardware FPU. Of course, output data (such as rendering data) is
independent from the game state, and so free from this restriction.
Using a reproducible game engine offers some big benefits to networked games. Firstly, since
the same sequence of inputs always produces the same result, participants can re-synchronise
simply and reliably by transmitting and processing player inputs: If two game engines have
started from the same state, and processed the same inputs, they will reach identical states.
This reduces required bandwidth considerably, as no absolute game state information is
involved. Secondly, a network model based on the transmission of player inputs is much
simpler to implement. The data structure of all required messages is already clearly defined by
the player input structures, and the game engine already has functions for processing received
messages. This approach is sublimely simple, and in principal scales to any level of software
complexity.
For further reading on this subject, Peter Lindcroft presents an interesting account of a network
game based on a reproducible game engine architecture in his article "The Internet Sucks: Or,
What I Learned Coding X-Wing vs. TIE Fighter", including some of the problems encountered by
his team.
Despite some limitations, I have found reproducibility to be immensely useful in my own work,
particularly as an aid to debugging. As a design-driven approach it is a classic case of working
"smarter" rather then harder: a little planning at the start of development brings big benefits
later on. It allows teams to easily add features that would otherwise have been problematic,
and to consider other features that would not otherwise be possible.
The key thing to bear in mind is that a game engine either exhibits reproducible behavior, or it
doesn't. This means that reproducibility is easiest to implement at the start of development, but
becomes increasingly difficult to add later on. If you would like to enjoy the benefits it has to
offer in your own projects then it is important to apply some thought and preplanning at the
start in order to reap the rewards later on. But then again, isn't that always the way with game
development?
References
Peter Lindcroft, "The Internet Sucks: Or, What I Learned Coding X-Wing vs. TIE Fighter",
Gamasutra, September 3, 1999
In this article, I'm going to describe Talking Heads, our facial animation system which uses
parsed speech and a skeletal animation system to reduce the workload involved in creating
facial animation on large scale game projects. SCEE's Team Soho is based in the heart of
London, surrounded by a plethora of postproduction houses. We have always found it difficult to
find and keep talented animators, especially with so many appealing film projects being created
on our doorstep here in Soho.
The Getaway is one of SCEE's groundbreaking in-house projects. It is being designed by Team
Soho, the studio that brought you Porsche Challenge, Total NBA, and This Is Football. It
integrates the dark, gritty atmosphere of films like Lock, Stock, and Two Smoking Barrels and
The Long Good Friday with a living, breathing, digital rendition of London. The player will
journey through an action adventure in the shoes of a professional criminal and an embittered
police detective, seeing the story unfold from two completely different characters with their own
agendas.
The Getaway takes place in possibly the largest environment ever seen in a video game; we
have painstakingly re-created over 50 square kilometers of the heart of London in blistering
photorealistic detail. The player will be able to drive across the capital from Kensington Palace
to the Tower of London. But the game involves much more than just racing, the player must
leave their vehicle to enter buildings on foot to commit crimes ranging from bank robberies to
gang hits.
So, with a huge project such as The Getaway in development and unable to find enough
talented people, the decision was made to create Talking Heads, a system that would severely
cut down on the number of man-hours spent on tedious lip-synching.
Breaking It Down
The first decision to be made was whether to use a typical blend-shape animation process or to
use a skeleton-based system. When you add up the number of phonemes and emotions
required to create a believable talking head, you soon realize that blend shapes become
impractical. One character might have a minimum of six emotions, 16 phonemes, and a bunch
of facial movements such as blinking, breathing, and raising an eyebrow. Blend shapes require
huge amounts of modeling, and also huge amounts of data storage on your chosen gaming
platform.
The skeleton-based system would also present certain problems. Each joint created in the
skeleton hierarchy has to mimic a specific muscle group in the face.
"If you want to know exactly which muscle performs a certain action, then you won't find an
answer in Gray's Anatomy. The experts still haven't defined the subject of facial expression.
Though psychologists have been busy updating our knowledge of the face, anatomists have
not." -- Gary Faigin, The Artist's Complete Guide to Facial Expression
Most information on the Internet is either too vague or far too specialized. I found no one who
could tell me what actually makes us smile. The only way forward was to work with a mirror
close at hand, studying my own emotions and expressions. I also studied the emotions of
friends, family, work colleagues, and people in everyday life. I have studied many books on
facial animation and over the years attended many seminars. I strongly recommend a book by
Gary Faigin, The Artist's Complete Guide to Facial Expression. If you can, try and catch Richard
Williams in one of his three day master classes; his insight into animation comes from working
with the guys who created some of the best Disney classics.
Only part of a face is used during most expressions. The whole face is not generally used in
facial expressions. The areas around the eyes, brows and the mouth contain the greatest
numbers of muscle groups. They are the areas that change the most when we create an
expression. We look at these two positions first and gather most of our information from them.
Although other areas of the face do move (the cheeks in a smile for example), 80 percent of an
emotion is portrayed through these two areas.
Neutral positions. We can detect changes in a human face because we understand when a face
is in repose. We understand the positions of the brow and the mouth, and how wide the eyes
are. These elements are constant from face to face. This is true if we are familiar with a
person's face at rest or not (see Figure 1).
This changed the way we built our models, adding greater detail around the eyes and the
mouth. Simulating the muscle rings seen in anatomy books allowed for greater movement in
the face at these points.
The proportions of the face are the key to building a good head. Get this right and you are well
on the way to creating realistic facial animation. Asymmetry is another goal to strive for when
modeling your heads. Do not create half a head and flip it across to create the other half. The
human head is not perfectly symmetrical.
Study of facial proportions by Leonardo da Vinci.
There are many rules concerning facial proportions. The overall shape of the head is governed
by a simple rule: The height of the skull and the depth of the skull are nearly the same. The
average skull is only two-thirds as wide as it is tall. The human head can be divided into thirds:
forehead to brow; brow to base of nose; and base of nose to chin. The most consistent rule is
that the halfway point of the head falls in the middle of the eyes. Exceptions to this are rare. A
few other general rules:
o The width of the nose at the base is the same as the width of an eye.
o The distance between the brow and the bottom of the nose governs the height of the ear.
o The width of the mouth is the same as the distance between the centers of the pupils.
o The angle between the top lip and the bottom lip is 7.5 degrees.
o The bottom of the cheekbones is the same height as the end of the nose.
The heads for The Getaway all stem from one model. This head contains the correct polygon
count, animation system and weighting. We scan actors using a system created by a company
called Eyetronics, a very powerful and cost-effective scanning process. A grid is projected onto
the person's face whom you wish to scan and photographs are taken. These photographs are
passed through the software and converted into 3D meshes. Each mesh is sewn together by the
software, and you end up with a perfect 3D model of the person you scanned. At the same time
it creates a texture map and applies this to the model.
Then the original head model, the one that contains the correct polygon count and animation, is
morphed into the shape of the scanned head. Alan Dann, an artist here at SCEE, wrote
proprietary in-house technology to morph the heads inside Maya. The joints in the skeleton
hierarchy are proportionally moved to compensate for the changes in the head. We are left with
a model that has the stipulated in-game requirements but looks like the actor we wish to see in
the g.
1,500-polygon model used for high-
res in-game and medium resolution
cutscenes
The Getaway heads are designed with incredible level of detail. We use a 4,000-polygon model
for extreme close-ups in the real-time cut scenes. The highest-resolution in-game model is
1,500 polygons, which includes tongue, teeth, eyelashes, and hair.
The skeleton hierarchy also contains level of detail; we remove joints as the characters move
further away from the camera. Eventually only three joints remain, enough to rotate the head
and open the mouth using the jaw.
The skeleton hierarchy was created based on the above study. Two main joints are used as the
controls, the neck and the head. The "neck" is the base, the joint that is constrained to the
skeleton of the character model. This joint can either be driven by constraints or motion capture
data from the character model can be copied across. This gives us the point at which we have
seamless interaction between the head and body. The "head" joint would control slight head
movements: shaking and nodding, random head motions, and positions taken up in different
expressions. The head leans forward during anger or downward when sad. This is the joint that
all other joints spring from; it's used as the controlling joint. Wherever it goes, the rest of the
joints go. Other joints which relate to specific muscle groups of the face are:
The idea behind this mass of joints is that they simulate certain muscle groups. The muscles of
the face are attached to the skull at one end. The other end is attached straight to the flesh or
to another muscle group. This is different from muscles in the body, which are always attached
to a bone at both ends. As the muscles contract, it should be a simple case of just animating
the scales of our joints to simulate these contractions. Unfortunately this is not the case, as
there are actually hundreds of muscles which all interact together. To achieve realistic
expression we had to rotate, scale, and translate the joints.
Weighting
How do you go about assigning an arbitrary head model to this skeleton? The original skinning
of the character took two whole days of meticulous weighting, using Maya and its paint weights
tool to achieve this.
I didn't wish to do this for every head. Joe Kilner, a programmer here at SCEE who was writing
the animation system with me, came up with a MEL script (Maya Embedded Language) that
would copy weights from one model to another. The script basically saved out the weights of
the vertices using two guidelines: the vertex's normal direction and UV coordinates. This
enabled us to export weights from one head and import them onto another.
For this to work, we had to make sure that all of our head textures conform to a particular fixed
template. The added bonus of this is that then we can apply any texture to any head. The
template also made it easier to create our face textures.
Research has shown that people recognize six universal emotions: sadness, anger, joy, fear,
disgust, and surprise. There are other expressions that we have that are more ambiguous. If
you mix the above expressions together, people offer differing opinions on what they suggest.
Also, physical states such as pain, sleepiness, passion, and physical exertion tend to be harder
to recognize. So if you wish to make sure that the emotion you are trying to portray is
recognized, you must rely on the overall attitude or animation of the character. Shyness, for
example, is created with a slight smile and downcast eyes. But this could be misinterpreted as
embarrassed or self-satisfied.
Emotions are closely linked to each other. Worry is a less intense form of fear, disdain is a mild
version of disgust, and sternness is a mild version of anger. Basically blending the six universal
emotions or using lesser versions of the full emotions gives us all the nuances of the human
face.
Creating the emotions on your base skeleton is the next step. Which emotions should the
system incorporate? We use the six universal emotions, some physical emotions, a phoneme
set and a whole load of facial and head movements. The system inside Maya runs off the back
of three locators. Each locator controls a different set of Set Driven Keys. A locator in Maya is a
Null object that can have attributes added.
The first locator controls expressions. Each of the following is an attribute on the locator:
sadness, anger, joy, fear, disgust, surprise, shock, perplexed, asleep, pain, exertion, and shout.
Each attribute has a value which ranges from 0 to 10.
The skeleton is set to a neutral pose which is keyed at zero on all the emotion attributes. Then
the joints are scaled, rotated, and translated into an expression, for example, "sad." Using
Maya's Set Driven Key, this position is keyed onto a value of 5 on the sadness attribute. Then
at a value of 10, "crying open mouthed" is keyed, giving us a full emotional range for sadness.
Now the face is set up so that Maya can blend from a "neutral" pose to one of "sad" and then
continue on to "crying."
Sadness attribute keyed at a value of 0, 10, and 15.
For each emotion attribute, several different keys are assigned as above. This gives the
character a full range of human emotions. These emotion attributes can then be mixed together
to achieve subtle effects.
A mixture of joy and sadness produces a sad smile, while anger and joy produce a wicked grin.
The process is additive, which means that mixing emotions over certain values starts to pull the
face apart. A good rule of thumb is never to let the total of the attributes exceed the maximum
attribute value. As we have keyed ours between 0 and 10, we try never to exceed 10. If you
mix three emotion attributes together and they have equal values then each cannot exceed 3.3.
There are attributes that can be mixed at greater levels, but trial and error is a great way of
finding out which you can mix and which you can't.
"A phoneme is the smallest part of a grammatical system that distinguishes one utterance from
another in a language or dialect." -- Bill Fleming and Darris Dobbs, Animating Facial Features
and Expressions
Basically, a phoneme is the sound we hear in speech. Combining phonemes, rather than letters,
creates words. The word "foot" would be represented by "f-uh-t."
Visual phonemes (visemes) are the mouth shapes and tongue positions that you create to make
a phoneme sound during speech. The common myth is that there are only nine visual
phonemes. You can create wonderful animation from just these nine; however, there are in fact
16 visual phonemes. Although some may look very similar externally, the tongue changes
position.
Our second locator controls the phonemes. They are assigned in exactly the same way as the
emotion attributes. An exaggerated form of each phoneme is keyed at 10. When creating the
lip-synching we generally only use values up to 3.
The phoneme set shown is Anglo-American. This can be replaced with any phoneme set from
around the world. You can conceivably make your character talk in any language you wish.
z Never animate behind synch. Do not try to animate behind the dialogue. In fact it's better
to animate your phonemes one or two frames in front of the dialogue. Before you can
utter a sound you must first make the mouth shape.
z Don't exaggerate. The actual range of movement while talking is fairly limited. Study your
own mouth movements.
Talking Heads tries to simulate realistic facial movements, and "less is more" is true for all parts
of the system. The mouth doesn't open much at all while talking, so don't make your visual
phonemes exaggerated.
Specials
The third locator controls aspects of the face that are so natural that we don't even think about
them. These attributes are essential if you want to achieve realistic facial animation.
Blinking. A human blinks once every four seconds. This timing can change according to what
emotional state the character is in. If anger is your dominant attribute then the blink rate
should decrease to once every six seconds. The reason behind this is physical; the eyes open
wide in anger, achieving a glare. If you are acting nervous then the blink rate increases to once
every two seconds. This reaction is involuntary. Blinking brings realism to your characters but
also emphasizes a particular emotion or mood.
Facial shrug and raising eyebrows. These attributes are generally used when the character is
silent, listening to a conversation, etc. The human face is never static, it's constantly moving.
This movement can take many forms. Slight head movement, constant eye movement and
blinking are excellent at keeping the character alive. Raising an eyebrow or performing a facial
shrug can be used in conjunction with emotion attributes to add a little extra emphasis to the
emotion.
Nodding and shaking the head. Whenever we encounter a positive or negative statement, we
either nod in agreement or shake our head in disapproval. These are involuntary acts and the
quickest ways to state your point of view without opening your mouth. Note that the neutral
position of these two attributes is set at 5. This allows the head to move in four separate
directions, up, down, left, and right.
Random head motion. We realized very quickly when animating our heads that when you talk
you are constantly moving your head. The random head attribute simulates this slight
movement.
Breath. The breathing attribute is set at several different positions. It can simulate slight
breathing to full gasps.
There is one final locator that I haven't yet mentioned. This locator is called the "look at" and
controls what the character is seeing. The joints that control the eyes are constrained using aim
constraints in Maya. This forces the joints to always track/point at the "look at" locator. You can
then use the locator to control the character's point of view. You can animate this locator and
enable your character to glance away during a conversation. The angles of the eye joints are
linked via an expression with the head joint. If the eyes are forced to rotate more than 20
degrees to follow the "look at" locator, the head rotates to compensate. We found this to be
very realistic, mimicking the movement of the head (Figure 13).
Here are a few additional pointers for animators when animating facial expressions.
You must have two frames to be able to read it! When you are laying down keyframes for your
lip-synching, always make sure that the consonants last for a minimum of two frames at 24 fps.
Obviously, if you are running at 60fps on PS2, then triple this. Any phoneme that is a
consonant, such as p, b, m, f, or t, must be keyed in this way. This rule cannot be broken; the
mouth must be in a closed state for the two frames. If you don't make sure of this then you will
not be able to read what the character is saying. If you have no time to fit this in, steal from
the previous word.
Make sure your animation is ahead of your timeline. The easy way to do this is to animate to
your sound file. When you are happy with your animation and lip-synching, move the sound
forward in the timeline and make sure that the animation starts one to two frames before the
sound. You cannot utter a peep unless you have made the correct mouth shape. This will
improve your lip-synching.
Subtlety is king. I cannot stress too much how important this is. The urge once you have
created your system is to go mad. The human face is a subtle machine, keep your movements
to a minimum and your animations will look much more realistic.
Move the eyes. If you want to keep your character alive keep the eyes moving. When we are
talking to someone we spend 80 percent of our time tracking their eyes and mouth and 20
percent glancing at their hands and body.
Head synch is almost as important as lip-synch. Every word and pause should have a separate
head pose. We use random head motion to achieve this. Some words need accenting or
emphasizing. Listen to your sound file and pick out the words that are stressed, these are the
ones to which you should add extra head movement.
The System
We have talked about the basics of facial animation, why we chose a skeleton-based system,
and how we put this into practice. The next step is to explain exactly how Talking Heads works.
As I've mentioned before, the point of a system like this is to reduce the workload and demands
on a small group of animators working on a large project. The only way that this can happen is
to hand over some of the more tedious tasks of facial animation to the computer.
Our facial animation system works on three levels: the first is concentrated around achieving
believable lip-synching, the second around laying down blocks of emotions, and the third on
underlying secondary animation such as blinking or breathing.
Lip-synching. The first step is to record an uncompressed 44kHz .WAV file of the chosen actor
and script. A good point to mention here is that your script should contain a series of natural
pauses. A good actor or voice-over artist should give you this automatically. Remember, you
want the best performance you can get. The sound file contains all the hints you will need to
animate emotions and will carry your animation. The pauses aid the system, allowing it to work
out where it is in the .WAV file when it calculates the phonemes.
We then create a text file, which is an exact script of the .WAV file. During the creation of the
phonemes, the text file is matched against a phoneme dictionary. There are many such
dictionaries on the web, it's just a matter of finding a free one (see For More Information). The
dictionary contains a huge list of words and their phoneme equivalents. By checking the script
against this dictionary, the system determines the phonemes required to make the words.
Some obscure words are not covered, and we enter these into our dictionary by hand.
Most of the development time of Talking Heads was taken up working out how to parse
the .WAV file. This is all custom software which enables us to scan through our sound file and
work out the timings between the words. We also work out the timing between phonemes,
which is very important.
Talking Heads then lays down keyframes for the phonemes in Maya. It does this by taking the
information from the dictionary and the .WAV file and matching them, phoneme against length
of time. As mentioned before these keys are assigned to the locator that controls the
phonemes. This allows for easy editing of the phonemes at a later stage by an animator, or the
creation of a complete new phoneme animation if the producer decides that he wants to change
the script. So a one-minute animation that could take a week to animate by hand can be
created in half an hour. Then the animator is free to refine and polish as he sees fit.
One advantage to the system is the creation of language SKUs. We produce products for a
global market, and there is nothing more frustrating than re-doing tedious lip-synching for each
country. Talking Heads gets around this problem quite efficiently. You have to create a
phoneme set for each language and find a corresponding phoneme dictionary, but once you
have done this the system works in exactly the same way as before. You can lay down
animations in English, French, German, Japanese, or whatever language you wish.
Emotions. The next step is to add blocks of emotion. To do this we edit the text file that we
created from the .WAV file. A simple markup language is used to define various emotions
throughout the script.
As you can see, emotions are added and given values. These values correspond with those on
the emotion locator. An Anger value of 2.2 gives the character a slight sneer, and by the end of
this sentence the character would smirk. In this way, huge amounts of characterization can be
added. We video our actors at the time we record the sound, either in the sound studio or the
motion capture studio. We can then play back the video recording of the scene we are editing
and lay down broad emotions using the actor's face as a guideline.
The advantage of editing a text file is that anyone can do it. You do not have to be an animator
or understand how a complicated software package works. As long as the person who is editing
knows what the different emotion values look like, they can edit any script. Using the video of
the actor's face allows anyone to see which emotions should be placed where and when.
Later on, an animator can take the scene that has been setup using the script and go in and
make changes where necessary. This allows our animators to concentrate their talents on more
detailed facial animation, adding subtlety and characterization by editing the sliders in the
animation system and laying keys down by hand.
Specials. The third area to be covered by the Talking Heads system concentrates on a wide
range of subtle human movements. These are the keys to bringing your character to life.
Talking Heads takes the text file and creates emotions from the markup language as it matches
phonemes and timings. It also sets about laying down a series of secondary animations and
keying these to the third locator. As mentioned before, this locator deals with blinking, random
head motion, nodding and shaking of the head, breathing, and so on.
Blinking is controlled by the emotion that is set in the text file. If the character has anger set
using the markup language, then it will only set blinking keyframes once every six seconds.
When angry, the face takes on a scowl, the eyes open wide, and blinking is reduced to show as
much whites of the eyes as possible. It has lengths of time for each emotion and will use the
one with the highest value as the prime emotion for blinking. Also added is a slight randomness
which will occasionally key in a double blink. The normal blinking rate is once every four
seconds, and if the character is lying or acting suspiciously this rate increases to once every two
seconds.
Random head motion is keyed only when keyframes are present for phonemes. This means that
the character always moves his head when he is speaking. This is a subtle effect; be careful
with the movement, a little goes a long way. The next pass looks for positive and negative
statements. It tracks certain words such as "yes, no, agree, disagree, sure, certainly, never."
When it finds such words, it sets keyframes for nodding and shaking of the head. Using the
timing from the script, it uses a set of decreasing values on the nod and shake head Set Driven
Keys. This gives us very realistic motion.
Breathing is automatic; the system keys values when it reaches the end of a sentence. This
value can differ depending on the physical state of the character. Normal values are hardly
detectable, while extreme values mimic gasping for breath.
At this stage the system also creates keys for random eye motion. This keeps the character
alive at all times. If your character stops moving at any point, the illusion of life is broken.
Set up and ready to go. Once everything has run through Talking Heads, we have a fully
animating human head. At this stage an animator has not even overseen the process. Our
character blinks, breathes, moves, talks, and expresses a full range of human emotion.
At this point we schedule our animators onto certain scenes and they make subtle changes to
improve the overall animation, making sure that the character is reacting to what other
characters are saying and doing.
The process of creating Talking Heads has been a long nine months, and still changes are being
made. We continue to tinker and evolve the system to achieve the most believable facial
animation seen in a computer game. Whether we have done this successfully will only be seen
when The Getaway is eventually released.
The next step is to incorporate Talking Heads into real-time. This would allow our in-game NPCs
to react to whatever the player does. This is already in motion and we hope to see this
happening in The Getaway.
Facial animation can be achieved without huge animation teams. The process of creating
Talking Heads has been an extremely worthwhile experience. We are now able to turn out
excellent animations in very short times. Our team of animators is free to embellish facial
animation, adding real character and concentrating their efforts on creating the huge amount of
animation required for in-game and cutscenes.
Gavin Moore has worked in the games industry for 10 years. He is currently the senior animator
on The Getaway at Sony Computer Entertainment Europe's Team Soho. He is in charge of a
team of artists and animators responsible for all aspects of character creation and animation in
the game. Gavin can be reached at [email protected].
Books
Faigin, Gary. The Artist's Complete Guide to Facial Expression. New York: Watson-Guptill, 1990.
Fleming, Bill, and Darris Dobbs. Animating Facial Features and Expressions. Rockland, Mass.:
Charles River Media, 1999.
Park, Frederic I., and Keith Waters. Computer Facial Animation. Wellesley, Mass.: A. K. Peters,
1996.
Web Sites
HighEnd3D
http://www.highend3d.com/
3dRender.com
http://www.3drender.com/
http://www.gamasutra.com/features/20010418/moore_01.htm
The famous Moore's law, which states in rough terms that every 18 months the speed of
computers doubles, has an evil twin: every 18 months software becomes twice as slow. A
similar relationship can be formulated for RAM and game data: no matter how big the memory
budget of your next-generation game may seem, your art team can probably fill it up faster
than you can say "disk thrashing." The appetite for art megabytes grows faster than the
publisher's willingness to raise the minimum platform requirements.
Until we start seeing games with a serious amount of geometry, the greatest slice of the
memory pie will belong to textures. Nobody wants to ship a game with small, blurry, obviously
tiling textures—and it's up to the programmers to alleviate texture limitations. The hundreds of
megabytes of stuff coming from the art quarters must be compressed.
Conventional image-compression algorithms are not very well suited to the specific
requirements of art storage in games. They are designed for relatively fast compression, which
is not an issue, since art assets are preprocessed offline; their decompression speed leaves
much to be desired. Also, it is usually hard to access a specific portion of the image.
For fixed textures used in hardware-rendered games, the texture compression schemes such as
DXTn present a solution; however, for supporting older hardware, for (gasp!) software
renderers, and doing more complicated stuff with textures they aren't perfect. Sure, you could
decompress DXTn in software and process it, but those formats aren't really meant for this—it
would probably be quite slow. There is a better solution in terms of both decompression speed
and image quality.
This article provides an introduction to the field of VQ, presents two algorithms for performing
VQ, and goes into the details of a successful real-world application for VQ texture compression.
Both these are examples of scalar quantization—the input and output values are scalars, or
single numbers. You can do vector quantization (VQ) too, replacing vectors from a continuous
(or dense discrete) input set with vectors from a much sparser set (note that here by vector we
mean an ordered set of N numbers, not just the special case of points in 3D space). For
example, if we have the colors of the pixels in an image represented by triples of red, green,
and blue intensities in the [0.0, 1.0] range, we could quantize them uniformly by quantizing
each of the three intensities to an 8-bit number; this leads us to the traditional 24-bit
representation.
By quantizing each component of the vector for itself, we gain nothing over standard scalar
quantization; however, if we quantize the entire vectors, replacing them with vectors from a
carefully chosen sparse nonuniform set and storing just indices into that set, we can get a much
more compact representation of the image. This is nothing but the familiar paletted image
representation. In VQ literature the "palette," or the set of possible quantized values for the
vectors is called a "codebook," because you need it to "decode" the indices into actual vector
values.
It turns out that VQ is a powerful method for lossy compression of data such as sounds or
images, because their vector representations often occupy only small fractions of their vector
spaces. We can illustrate this distribution in the case of a simple representation of a grayscale
image in a 2D vector space. The vectors will be composed by taking in pairs the values of
adjacent pixels. If the input image has 256 shades of gray, we can visualize the vector space as
the [0,0]-[255,255] square in the plane. We can then take the two components of the vectors
as XY coordinates and plot a dot for each vector found in the input image.
Figure 2 shows the result of this procedure applied to a grayscale version of the famous
"Lena" (Figure 1), a traditional benchmark for image-compression algorithms.
FIGURE 1. Lena in grayscale.
The diagonal line along which the density of the input vectors is concentrated is the x = y line;
the reason for this clustering is that "Lena," like most photographic images, consists
predominantly of smooth gradients. Adjacent pixels from a smooth gradient have similar
values, and the corresponding dot on the diagram is close to the x = y line. The areas on the
diagram which would represent abrupt intensity changes from one pixel to the next are sparsely
populated.
If we decide to reduce this image to 2 bits/pixel via scalar quantization, this would mean
reducing the pixels to four possible values. If we interpret this as VQ on the 2D vector
distribution diagram, we get a picture like Figure 3.
The big red dots on the figure represent the 16 evenly spaced possible values of pairs of pixels.
Every pair from the input image would be mapped to one of these dots during the quantization.
The red lines delimit the "zones of influence," or cells of the vectors—all vectors inside a cell
would get quantized to the same codebook vector.
Now we see why this quantization is very inefficient: Two of the cells are completely empty and
four other cells are very sparsely populated. The codebook vectors in the six cells adjacent to
the x = y diagonal are shifted away from the density maxima in their cells, which means that
the average quantization error in these cells will be unnecessarily high. In other words, six of
the 16 possible pairs of pixel values are wasted, six more are not used efficiently and only four
are O.K.
Let's perform an equivalent (in terms of size of resulting quantized image) vector quantization.
Instead of 2 bits/pixel, we'll allocate 4 bits per 2D vector, but now we can take the freedom to
place the 16 vectors of the codebook anywhere in the diagram. To minimize the mean
quantization error, we'll place all of these vectors inside the dense cloud around the x = y
diagonal.
FIGURE 4. Vector quantization to
4 bits per 2D-vector.
Figure 4 shows how things look with VQ. As in Figure 3, the codebook vectors are represented
as big red dots, and the red lines delimit their zones of influence. (This partitioning of a vector
space into cells around a predefined set of "special" vectors, such as for all vectors inside a cell
the same "special" vector is closest to them, is called a Voronoi diagram; the cells are called
Voronoi cells. You can find a lot of resources on Voronoi diagrams on the Internet, since they
have some interesting properties besides being a good illustration of the merits of VQ.)
You can see that in the case of VQ the cells are smaller (that is, the quantization introduces
smaller errors) where it matters the most—in the areas of the vector space where the input
vectors are dense. No codebook vectors are wasted on unpopulated regions, and inside each
cell the codebook vector is optimally spaced with regard to the local input vector density.
When you go to higher dimensions (for example, taking 4-tuples of pixels instead of pairs), VQ
gets more and more efficient—up to a certain point. How to determine the optimal vector size
for a given set of input data is a rather complicated question beyond the scope of this article;
basically, to answer it, you need to study the autocorrelation properties of the data. It suffices
to say that for images of the type and resolution commonly used in games, four is a good
choice for the vector size. For other applications, such as voice compression, vectors of size 40-
50 are used.
The main problem in VQ is choosing the vectors for the codebook so that the mean quantization
error is minimal; after the codebook is known, mapping input vectors to it is a trivial matter of
finding the best match. In applications where the quantization is performed in real time, a
trivial approach to this latter step might prove too slow, but in our case it was on orders of
magnitudes faster than finding an optimal codebook.
We experimented with two algorithms for VQ, the classical GLA (generalized Lloyd algorithm,
sometimes called K-means clustering), and Anthony Dekker's Neuquant. Both of them are
extremely computationally expensive, basically using brute force to find a general solution to
the problem. Other, much faster algorithms exist, but they achieve speed by restricting the
generality of the codebook (for example, tree-structured VQ), which would lead to greater
quantization error. For our purposes—compression as a preprocessing stage for some of the art
assets—compression times of a few hours were acceptable, and that was well within the reach
of the brute force algorithms.
There are two problems with GLA: what to do with "empty cells" and how to choose the initial
codebook. You have an empty cell when some vector from the codebook gets no input vectors
assigned to it during the assignment phase (its Voronoi cell is empty). It will not move during
the adjustment phase and will therefore the cell will probably remain empty in all subsequent
operations. In this case, you should remove it from the codebook. You need some kind of
heuristic to come up with a prospective replacement. You could split the codebook vector with
the greatest number of assigned input vectors into two close vectors, and let several iterations
pull them apart; or you could split the one whose assigned vectors are most distant. The first
heuristic aims to minimize the mean error, while the second minimizes maximum error.
If you have a satisfying solution to the dead vector problem, the choice of initial codebook does
not matter—you could start out with random vectors, or with all vectors at the origin, GLA will
eventually move them into position and eliminate improper ones. This, however, can take tens
of iterations—work which would be spared with a more careful starting position.
Neuquant
Neuquant was the name given by Anthony Dekker for his application of Kohonen's self-
organizing maps (SOMs)—a special type of neural network—to color quantization. We found it
quite suitable for quantization of vectors of higher dimensions.
Imagine our codebook as a string of connected points—neurons—in the vector space. Each
neuron has two neighbors (except, of course, for the first and the last) in this string. Now, with
each vector in the input set, you "stimulate" the string: you find the neuron closest to the
stimulus vector, and you pull it in the direction of the stimulus, say, one-half the distance
towards it. Its neighbors in the string also get pulled towards the stimulus, but by a lesser
amount, say, one-fourth. Their neighbors are influenced only by one-eighth and so on, until at
some distance along the string the reaction to the stimulus stops.
When you feed the input set as stimuli, the string moves, turns, and stretches itself to occupy
these parts of the vector space where the input data is dense—which is precisely what you want
from a good VQ codebook. Neuquant with larger codebooks sometimes wastes codebook
entries.
We found that, in general, Neuquant gives decent results faster than GLA (with less iterations),
but when given enough time GLA tends to produces better codebooks, which adapt well to
"stray" vectors. If you can live with the time, using GLA is definitely recommended.
Texture images are a prime candidate for VQ—they are often of limited color gamut (for
example, in a grass texture you might have hundreds of shades of green, but only a few
different reds, blues, and whites) and have a lot of similar, same-frequency features. Several
hardware vendors have recognized the suitability of textures for VQ compression. For example,
the Dreamcast's video chip supports rendering directly from textures compressed to 2
bits/pixel. The vectors in this case are 2x2 blocks of pixels, or 12-dimensional; the codebook
has 256 entries for single-byte indices.
The rest of this article is a detailed account of our practical experience with VQ compression for
our current project, a software-rendered 2.5D RTS. The source code accompanying the article is
very close to what we have in the actual game, so you can easily experiment with ideas and
trade-offs discussed below.
For our game we didn't have the luxury of building special hardware for VQ, so we had to
design our VQ scheme around the software blitter/blender. Since it uses MMX to process four
adjacent 16-bit pixels, we chose to quantize 12-dimensional vectors too, but taken from a 4x1
block of pixels. This leads to slightly worse results compared to the 2x2 blocks, because the
pixels in the tightly packed square block are more closely correlated (that is, likely to have
similar values).
Both VQ algorithms work on vectors via a bunch of operators and don't care about their
dimensionality or internal representation. This makes them a perfect fit for generic
implementations in the form of templated C++ classes, with the vector types left as template
parameters for the quantizer classes. The vector class should provide +=, -=, *= operators
with their usual meaning, a function returning the distance between two vectors according to
the metrics of choice (Euclidian distance works just fine). Neuquant needs an additional
function, shifting a vector towards another vector by a specified amount.
Because with both algorithms almost all of the time is spend in these vector operations, they
are a good candidate for SIMD optimization. Writing SSE and 3DNow versions of the vector
classes took us a couple of hours. They both run at about the same speed, roughly twice as fast
as their scalar counterparts on the respective processors; greater speedup is probably possible
with careful hand-tuning. The plain old x87 version of the vector class can also be implemented
in a generic manner without sacrificing performance, with the dimensionality as a template
parameter.
We had to spend much more time tackling the empty cell problem with the GLA algorithm. We
found that splitting the cell with the largest population of input vectors results in neglecting
colors outside the general color gamut of the image; for example, small patches of brown dirt in
a grass texture almost completely disappeared because the heuristics allocated all available
vectors to the greens. Splitting the largest cells (measured by the maximal distance from a
codebook vector to a input vector mapped to it) works much better and preserves color
variations more faithfully.
Another issue is the initial codebook configuration and the stop condition for the GLA algorithm.
Depending on the input data, sometimes GLA never stops updating the codebook vectors,
falling into a stable loop of several iterations. We arbitrarily stop the process when the last N
iterations have updated less than 1 percent of the codebook vectors, or when they have
updated the same number of input vectors. It might be possible to come across an image for
which the algorithm will fall into a loop which updates a different number of vectors on each
iteration but still never finishes; we haven't found such an image, but still we put an arbitrary
limit on the total number of iterations.
As for the initial configuration, we kept the naive solution of starting with all codebook vectors
having the same values. It takes about 10-15 iterations just to fill up the codebook with
"active" vectors, but the end results are better than starting with a random sample of vectors
from the image, starting with a random codebook or using a few iterations of Neuquant as a
"seed" for GLA.
Because we let the compression run overnight anyway, for each image we generated
compressed versions with different codebook sizes in the range 256-8,192. Then we examined
them and chose the best trade-off between visual fidelity and size.
For the compressed images we keep the codebook in high-color format (because that's what
our blitter needs; the sample code works with 24-bit true color) and for each image vector a 8-
bit or 16-bit index into the codebook. Reasonable codebook sizes (up to about 8,192) don't use
all 16 bits of the indices, but tight packing of the bits would slow down decompression, and this
is more important in our case than the additional compression gains. Even in this case you
should always use the smallest codebook possible to minimize memory traffic; ideally it should
fit into the CPU L1 data cache.
If you ignore the memory for the codebook, this corresponds to 2 or 4 bits/pixel. While a bit on
the low side for general image compression, this is excellent for an algorithm which is
practically free in terms of CPU time decompression.
When we put the VQ compression described above in our renderer, we expected to get slightly
worse performance but were ready to live with it, because it took about 20MB off our memory
footprint. However, profiles showed that rendering has actually gained about 15 to 20 percent
performance, probably due to decreased memory traffic.
Your project may benefit from VQ image compression if you have access to the renderer and
you can implement decompression directly inside it. It is also helpful if you can't rely on your
hardware renderer supporting a particular standard texture-compression format and still need
to stream from disk significant amounts of image data.
FIGURE 7. "Lena" uncompressed.
What about DXTn texture compression, you might ask? Well, if you can count on hardware
support for a texture-compression scheme on your minimum target platform, it is most
certainly a good idea to stick with it: after all, somebody has already paid for the tens of
thousands of transistors to implement it, and you'd better make good use of them, save tons of
RAM on the video card, and keep its precious memory bandwidth down. However, if, for some
reason—for example, software rendering (gasp!), texture preprocessing, or support for legacy
hardware (where "legacy" may mean "older than nine months" in our business)—you need to
have at some point in your application access to the decompressed pixels of your texture, VQ
might provide a better answer. DXTn compresses textures to 4 bits/pixel (for fully opaque or
single-bit-alpha images) or 8 bits/pixel (for images with alpha), while in our experiments we
rarely needed more than about 3 bits/pixel with VQ. Decompression times for DXTn are much
larger than those for VQ: in our experiments with a 1024x1024 true-color image, DXTn
decompression took about 200ms, straight copying of uncompressed data took about 25ms,
and VQ decompression from a 256-entry codebook took about 16ms. (Note: We didn't have
access to DXTn decompression source code, so we did what most developers would do to
decompress DXTn textures, we made the DirectX drivers do it for us. Driver code may or may
not be optimized, so above figures should be taken with a grain of salt. Still, the nature of DXTn
compression makes us believe that it's impossible to reach VQ decompression speeds.)
FIGURE 8. VQ-compressed "Lena," codebook size 1,024,
compression time 6 minutes, 2.6 bits/pixel.
Pros:
z Blindingly fast decompression (often faster than simply copying the uncompressed data,
orders of magnitude faster than decompressing PNGs or JPEGs)
z Good quality at excellent compression ratios (see the 2.6 bits/pixel samples below)
z A flexible choice of the trade-off between compression ratio and fidelity (from about 2
bits/pixel all the way to about 8 bits/pixel; even at 4 bits/pixel most images look
considerably better in VQ than in 8-bit simple palletization).
Cons:
z Very slow compression: compressing any practical amount of art assets is definitely an
overnight batch job.
z Nonstandard, not widely supported in hardware.
Sample Code
This article is accompanied with sample code for a command-line compression tool and a
minimalistic compressed file viewer. To build them you'd need Microsoft Visual C++. No
libraries outside the Win32 API and the STL are used.
Two samples of images compressed with this tool are shown in Figures 5 through 9. Running
times are for a 600MHz Pentium III for the SSE version of the algorithm. The bits-per-pixel
ratios are for the case where 10- or 11-bit codebook indices are packed tightly, not padded to
16 bits.
http://www.gamasutra.com/features/20010416/ivanov_01.htm
This paper explains the design architecture, implementation, and some of the lessons learned
creating the multiplayer (networking) code for the Age of Empires 1 & 2 games; and discusses
the current and future networking approaches used by Ensemble Studios in its game engines.
When the multiplayer code for Age of Empires™ was started in early 1996 there were some
very specific goals that had to be met to deliver the kind of game experience we had in mind.
The Genie Engine was already running and the game simulation was shaping up into a
compelling experience in single player. The Genie Engine is a 2D single-threaded (game loop)
engine. Sprites are rendered in 256 colors in a tile-based world. Randomly-generated maps
were filled with thousands of objects, from trees that could be chopped down to leaping
gazelles. The rough breakdown (post optimization) of processing tasks for the engine was: 30%
graphic rendering, 30% AI and Pathing, and 30% running the simulation & maintenance.
At a fairly early stage, the engine was reasonably stable -- and multiplayer communications
needed to work with the existing code without substantial recoding of the existing (working)
architecture.
To complicate matters further, the time to complete each simulation step varied greatly: the
rendering time changed if the user was watching units, scrolling, or sitting over unexplored
terrain, and large paths or strategic planning by the AI made the game turn fluctuate fairly
wildly by as much as 200 msec.
A few quick calculations would show that passing even a small set of data about the units, and
attempting to update it in real time would severely limit the number of units and objects we
could have interacting with the player. Just passing X and Y coordinates, status, action, facing
and damage would have limited us to 250 moving units in the game at the most.
We wanted to devastate a Greek city with catapults, archers, and warriors on one side while it
was being besieged from the sea with triremes. Clearly, another approach was needed.
Simultaneous Simulations
Rather than passing the status of each unit in the game, the expectation was to run the exact
same simulation on each machine, passing each an identical set of commands that were issued
by the users at the same time. The PCs would basically synchronize their game watches in best
war-movie tradition, allow players to issue commands, and then execute in exactly the same
way at the same time and have identical games.
This tricky synchronization was difficult to get running initially, but did yield some surprising
benefits in other areas.
Since the problem of moving hundreds or thousands of objects simultaneously was taken care
of by this approach -- the solution still had to be viable on the Internet with latency swings of
20 to 1,000 milliseconds, and handle changes in frame processing time.
Sending out the player commands, acknowledging all messages, and then processing them
before going on to the next turn was going to be a gameplay nightmare of stop-start or slow
command turnover. A scheme to continue processing the game while waiting for
communications to happen in the background was needed.
Mark used a system of tagging commands to be executed two "communications turns" in the
future (Comm. turns were separated in AoE from actual rendering frames).
So commands issued during turn 1000 would be scheduled for execution during turn 1002 (see
Figure 1). On turn 1001 commands that were issued on turn 0999 would be executed. This
allowed messages to be received, acknowledged, and ready to process while the game was still
animating and running the simulation.
Since the simulations must always have the exact same input, the game can really only run as
fast as the slowest machine can process the communications, render the turn, and send out
new commands. Speed Control is what we called the system to change the length of the turn to
keep the animation and gameplay smooth over changing conditions in communications lag and
processing speed.
There are two factors that make the gameplay feel "laggy": If one machine's frame rate drops
(or is lower than the rest) the other machines will process their commands, render all of the
allocated time, and end up waiting for the next turn -- even tiny stops are immediately
noticeable. Communications lag -- due to Internet latency and lost data packets would also stop
the game as the players waited around for enough data to complete the turn.
Each client calculated a frame rate that it thought could be consistently maintained by
averaging the processing time over a number of frames. Since this varied over the course of the
game with the visible line-of-sight, number of units, map size and other factors -- it was sent
with each "Turn Done" message.
Each client would also measure a round trip "ping time" periodically from it to the other clients.
It would also send the longest average ping time it was seeing to any of the clients with the
"Turn Done" message. (Total of 2 bytes was used for speed control.)
Each turn the designated host would analyze the "done" messages, figure out a target frame
rate and adjustment for Internet latency. The host would then send out a new frame rate and
communications turn length to be used. Figures 3 through 5 show how the communications turn
was broken up for the different conditions.
The "communications turn" which was roughly the round-trip ping time for a message, was
divided up into the number of simulation frames that on average could be done by the slowest
machine in that period.
The communications turn length was weighted so it would quickly rise to handle Internet
latency changes, and slowly settle back down to the best average speed that could be
consistently maintained. The game would tend to pause or slow only at the very worst spikes-
command latency would go up but would stay smooth (adjusting only a few milliseconds per
turn) as the game adjusted back down to best possible speed. This gave the smoothest play
experience possible while still adjusting to changing conditions.
Guaranteed Delivery
At the network layer UDP was used, with command ordering, drop detection and resending
being handled by each client. Each message used a couple of bytes to identify the turn that
execution was scheduled and the sequence number for the message. If a message was received
for a past turn, it was discarded, and incoming messages were stored for execution. Because of
the nature of UDP, Mark's assumption for message receipt was that "When in doubt, assume it
dropped." If messages were received out of order, the receiver immediately sent out re-send
requests for the dropped messages. If an acknowledgement was later than predicted, the
sender would just resend without being asked anticipating the message had been dropped.
Hidden Benefits
Because the game's outcome depended on all of the users executing exactly the same
simulation, it was extremely difficult to hack a client (or client communication stream) and
cheat. Any simulation that ran differently was tagged as "out of sync" and the game stopped.
Cheating to reveal information locally was still possible, but these few leaks were relatively easy
to secure in subsequent patches and revisions. Security was a huge win.
Hidden Problems
At first take it might seem that getting two pieces of identical code to run the same should be
fairly easy and straightforward -- not so. The Microsoft product manager, Tim Znamenacek, told
Mark early on, "In every project, there is one stubborn bug that goes all the way to the wire -- I
think out-of-sync is going to be it." He was right. The difficulty with finding out-of-sync errors is
that very subtle differences would multiply over time. A deer slightly out of alignment when the
random map was created would forage slightly differently -- and minutes later a villager would
path a tiny bit off, or miss with his spear and take home no meat. So what showed up as a
checksum difference as different food amounts had a cause that was sometimes puzzling to
trace back to the original cause.
As much as we check-summed the world, the objects, the pathfinding, targeting and every
other system -- it seemed that there was always one more thing that slipped just under the
radar. Giant (50MB) message traces and world object dumps to sift through made the problem
even more difficult. Part of the difficulty was conceptual -- programmers were not used to
having to write code that used the same number of calls to random within the simulation (yes,
the random numbers were seeded and synchronized as well).
Lessons Learned
A few key lessons were learned in the development of the networking for Age of Empires that
are applicable to development of any game's multiplayer system.
Know your user. Studying the user is key to understanding what their expectations are for
multiplayer performance, perceived lag, and command latency. Each game genre is different,
and you need to understand what is right for your specific gameplay and controls.
Early in the development process Mark sat down with the lead designer and prototyped
communications latency (this was something that was revisited throughout the development
process). Since the single-player game was running, it was easy to simulate different ranges of
command latency and get player feedback on when it felt right, sluggish, jerky, or just horrible.
For RTS games, 250 milliseconds of command latency was not even noticed -- between 250 and
500 msec was very playable, and beyond 500 it started to be noticeable. It was also interesting
to note that players developed a "game pace" and a mental expectation of the lag between
when they clicked and when their unit responded. A consistent slower response was better than
alternating between fast and slow command latency (say between 80 and 500 msec) -- in that
case a consistent 500 msec command latency was playable, but one that varied was considered
"jerky" and hard to use.
In real terms this directed a lot of the programming efforts at smoothness -- it was better to
pick a longer turn length and be certain that everything stayed smooth and consistent than to
run as quickly as possible with occasional slow-downs. Any changes to speed had to be gradual
and in as small increments as possible.
We also metered the users demands on the system -- they would typically issue commands
(move, attack, chop trees) averaging about every 1.5 to 2 seconds, with occasional spikes of 3
to 4 commands per second during heated battles. Since our game built to crescendos of frantic
activity the heaviest communications demands were middle and late game.
When you take the time to study your user behavior you'll notice other things about how they
play the game that can help your network play. In AoE, clicking repeatedly when the users were
excitedly attacking (clik-lik-lik-lik-lik -- go go go ) was causing huge spikes in the number of
commands issued per second -- and if they were pathing a large group of units -- huge spikes
in the network demand as well. A simple filter to discard repeat commands at the same location
drastically reduced the impact of this behavior.
Metering is king. You will discover surprising things about how your communications system
is working if you put in metering early, make it readable by testers, and use it to understand
what is happening under the hood of your networking engine.
Lesson: Some of the problems with AoE communication happened when Mark took the metering
out too early, and did not re-verify message (length and frequency) levels after the final code
was in. Undetected things like occasional AI race conditions, difficult-to-compute paths, and
poorly structured command packets could cause huge performance problems in an otherwise
well tuned system.
Have your system notify testers and developers when it seems like it is exceeding boundary
conditions -- programmers and testers will notice during development which tasks are stressing
the system and let you know early enough to do something about it.
Take the time to educate your testers in how your communications system works, and expose
and explain the summary metering to them -- you might be surprised what things they notice
when the networking code inevitably encounters strange failures.
Educating the developers. Getting programmers who are used to developing single-player
applications to start thinking about a detachment between the command being issued, received,
and being processed is tricky. It is easy to forget that you are requesting something that might
not happen, or might happen seconds after you originally issue the command. Commands have
to be checked for validity both on send and receive.
With the synchronous model, programmers also had to be aware that the code must not
depend on any local factor (such as having free time, special hardware, or different settings)
when it was in the simulation. The code path taken on all machines must match. For example
having random terrain sounds inside the simulation would cause the games to behave
differently (saving and re-seeding the pseudo-random number generator with the last random
number took care of things inside the simulation that we needed to be random but not change
the simulation.
Other lessons. This should be common sense -- but If you depend on a third-party network (in
our case DirectPlay), write an independent test application to verify that when they say
"guaranteed delivery" that the messages get there, that "guaranteed packet order" truly is, and
that the product does not have hidden bottlenecks or strange behaviors handling the
communications for your game.
Be prepared to create simulation applications and stress test simulators. We ended up with
three different minimal test applications, all to isolate and highlight problems like connection
flooding, problems with simultaneous matchmaking connects, and dropped guaranteed packets.
Test with modems (and, if you are lucky, modem simulators) as early as possible in the
process; continue to include modem testing (as painful as it is) throughout the development
process. Because it is hard to isolate problems (is that sudden performance drop because of the
ISP, the game, the communications software, the modem, the matchmaking service, or the
other end?) and users really don't want to hassle with flaky dialup connections when they have
been zipping along at instant-connection LAN speeds. It is vital that you assure testing is done
on modem connections with the same zeal as the LAN multiplayer games.
In Age of Empires 2: The Age of Kings, we added new multiplayer features such as recorded
games, file transfer, and persistent stat tracking on The Zone. We also refined the multiplayer
systems such as DirectPlay integration and speed control to address bugs and performance
issues that had come up since the release of Age of Empires.
The game recording feature was one of those things that you just happen to stumble upon as
an "I could really use this for debugging" task that ends up as a full-blown game feature.
Recorded games are incredibly popular with the fan sites as it allows gamers to trade and
analyze strategies, view famous battles, and review the games they played in. As a debugging
tool, recorded games are invaluable. Because our simulation is deterministic, and recorded
games are synchronous in the same way that multiplayer is synchronous, a game recording
gave us a great way of passing around repro cases for bugs because it was guaranteed to play
out the exact same way every time.
Our integration with the matchmaking system on The Zone was limited to straightforward game
launching for Age of Empires. In Age of Kings we extended this to allow for launch parameter
control and persistent stat reporting. While not a fully inside-out system, we utilized
DirectPlay's lobby launch functionality to allow The Zone to control certain aspects of the game
settings from the pre-game tables, and "lock" those settings in once the game was actually
launched. This allowed users to better find the games they wanted to play in, because they
could see the settings at the matchmaking level, rather than waiting to launch into the game
setup screen. On the backend we implemented persistent stat reporting and tracking. We
provide a common structure to The Zone, which we fill out and upload at the end of a game.
The data in this structure is used to populate a number of user ratings and rankings viewable
on The Zone's web site.
RTS3 is the codename for Ensemble's next-generation strategy game. The RTS3 design builds
on the successful formula used in the Age of Empires series games, and calls for a number of
new features and multiplayer requirements.
z Builds on the feature set of Age of Empires 1 and 2. Design requirements such as internet
play, large diverse maps, and thousands of controllable units are a given.
z 3D -- RTS3 is a fully 3D game, with interpolated animation and non-faceted unit position
and rotation.
z More players -- possible support for more than eight players.
z TCP/IP support -- 56k TCP/IP internet connection is our primary target.
z Home network support -- Support end-user home network configurations including
firewalls and NAT setups.
With RTS3, we made the decision early on to go with the same underlying network model as
Age of Empires 1 and 2 -- the synchronous simulation -- because the RTS3 design played to the
strengths of this architecture in the same ways. With AOE/AOK, we relied on DirectPlay for
transport and session management services, but for RTS3 we decided to create a core network
library, using only the most basic socket routines as our foundation and building from there.
The move to a fully 3D world meant that we had to be more sensitive to issues of frame-rate
and overall simulation smoothness in multiplayer. However, it also meant that our simulation
update times and frame-rate would be even more prone to variation, and that we would be
devoting more time overall to rendering. In the Genie engine, unit rotations were faceted and
animations were frame-rate locked -- with BANG! we allowed for arbitrary unit rotation and
smooth animation which meant that the game would be visually much more sensitive to the
effects of latency and see-sawing update rates.
Coming out of development on Age of Kings, we wanted to address those critical areas where
more up-front design and tool-set work would give the biggest payoff in terms of debugging
time. We also realized how important the iterative play-testing process was to the design of our
games, and so bringing the multiplayer game online as early as possible was high priority.
An OO approach. RTS3's network architecture is strongly object oriented (see Figure 6). The
requirements of supporting multiple network configurations really played to the strengths of OO
design in abstracting out the specifics of platform, protocol, and topology behind a set of
common objects and systems.
The protocol specific and topology specific versions of the network objects have as little code as
possible. The bulk of the functionality for these objects has been isolated in the higher-level
parent objects. To implement a new protocol, we extend only those network objects that need
to have protocol specific code (such as client and session, which need to do some things
different based on the protocol). None of the other objects in the system (such as Channels,
TimeSync, etc.) need change because they interface with client and session only through their
high level abstract interfaces.
We also employ the use of aggregation to implement multi-dimensional derivation (such as with
channels, that have an ordered/non-ordered axis of derivation, as well as a peer/repeater axis
of derivation) behind a single generic interface. Virtual methods are also used for non-intensive
notifications, rather than using callback functions.
Peer topology. The Genie engine supported a peer-to-peer network topology, in which all
clients in the session connect to all the other clients in a star configuration. With RTS3 we have
continued the use this topology because of its inherent benefits when applied to the
synchronous simulation model.
The peer topology implies a star configuration of connected clients in a session (Figure 7). That
is, all clients connect to all other clients. This is the setup that Age 1 and 2 utilized.
Peer-to-peer strengths:
z Reduced latency due to the direct client-client nature of the system, rather than a client-
server-client roundtrip for messages.
z No central point of failure -- if a client (even the host) disconnects from the session, the
game can continue.
Peer-to-peer weaknesses:
z More active connections in the system (Summation n=0 to k-1 (n)) -- means more
potential failure points and latency potential.
z Impossible to support some NAT configurations with this approach.
Net.lib. Our goal when designing the RTS3 communications architecture was to create a
system that was tailored for strategy games, but at the same time we wanted to build
something that could be used for in-house tools and extended to support our future games. To
meet this goal, we created a layered architecture that supports game-level objects such as a
client and a session, but also supports lower level transport objects such as a link or a network
address.
Link, Level 2
Level 2, the Link Level, offers transport layer services. The objects in this level, such as the
Link, Listener, NetworkAddress, and Packet represent the useful elements needed to establish a
connection and send some messages across it (see Figure 9).
Multiplayer, Level 3
The multiplayer level is the highest level of objects and routines available in the net.lib API. This
is the layer that RTS3 interfaces with as it collects lower level objects, such as links, into more
useful concepts/objects such as clients and sessions.
The most interesting objects in the BANG! network library are probably those that live at the
multiplayer level. Here, the API presents a set of objects that the game level interacts with, and
yet we maintain a game-independent approach in the implementation.
z Client: this is the most basic abstraction of a network endpoint. This can be configured as
a remote client (link) or local client (loopback link). Clients are not created directly, but
are instead spawned by a session object.
z Session: this is the object responsible for the creation, connection negotiation, collection
and management of clients. The session contains all the other multiplayer-level objects.
To use this object, the application simply calls host() or join(), giving it either a local
address, or remote address respectively and the session handles the rest. These
responsibilities include automatically creating/deleting clients, notification of session
events, and the dispatch of traffic to the appropriate objects.
z Channel and Ordered Channel: this object represents a virtual message conduit.
Messages sent across a channel will be automatically separated out and received on the
corresponding channel object on remote clients. An ordered channel works with the
TimeSync object to guarantee that the ordering of messages received on that channel will
be identical on all clients.
z Shared Data: Represents a collection of data shared across all clients. You extend this
object to create specific instances that contain your own data types, and then use the
built in methods to enable the automatic and synchronous updating of these data
elements across the network.
z Time Sync: Manages the smooth progression of synchronized network time across all
clients in a session.
The communications level is the RTS3 side of things. This is the main collection of systems
through which the game interfaces with the network library, and it actually lives within the
game code itself. The communications layer provides a plethora of useful utility functions for
the creation and management of multiplayer-level network objects and attempts to boil down
the game's multiplayer needs into a small easy to use interface.
Improved sync system. Nobody on the Age of Empires development team would argue the
need for the best sync tools possible. As with any project, when you look back on the
development process during a postmortem, some areas always stand out as the ones you spent
the most time on, but could have spent much less time on given more up-front work.
Synchronization debugging was probably at the top of this list as we started development on
RTS3.
The RTS3 synchronization tracking system is primarily geared towards rapid turn-around on
sync bugs. Our other priorities in developing it were ease of use for the developers, the ability
to handle an arbitrarily massive amount of sync data pouring through the system, the ability to
totally compile out synchronization code in a release build, and finally the ability to completely
change our test configuration by toggling some variables rather than requiring a recompile.
(There is a set of these macros per sync "tag," where a tag represents a given system to be
synced -- in this example, the random number generator, cRandSync) These macros both take
a userinfo string parameter, which is a name or indication of the specific item being synced.
For example, a sync call might look like:
Paul derived multiplayer-enabled versions of our console command and config variable systems.
With these, we are able to easily turn a normal config variable (such as enableCheating) into a
multiplayer config variable by adding a flag to the config variable's definition. With this flag
enabled, that config variable will then be passed around in a multiplayer game, and
synchronous game decisions (such as whether to allow free resource tributing) can be based off
of the value. Multiplayer console commands is a similar concept -- calls to a multiplayer-
enabled console command are passed around and executed synchronously on all client
machines.
Through the application of these two tools, the developers have a simple way to use the
multiplayer system without writing any lines of code. They can quickly add new testing tools or
configurations, and easily enable them in the network environment.
Summation
The synchronous simulation, peer to peer model was used successfully in the Age of Empires
series of games. While it is critical to acknowledge the importance of investing time creating
tools and technologies to combat the key challenges of this approach (such as synchronization
and network metering), the viability of this architecture is proven when applied to the real-time
strategy genre. The subsequent improvements we have implemented for RTS3 lead to an
overall multiplayer experience that is virtually indistinguishable from single player in all but the
most horrible network conditions.
Age of Empires logos and box images Copyright © 2000 Microsoft Corporation. Names, trademarks, and copyrights are the
property of the originating companies.
http://www.gamasutra.com/features/20010322/terrano_01.htm
Pathfinding is a core component of most games today. Characters, animals, and vehicles all
move in some goal-directed manner, and the program must be able to identify a good path
from an origin to a goal, which both avoids obstacles and is the most efficient way of getting to
the destination. The best-known algorithm for achieving this is the A* search (pronounced "A
star"), and it is typical for a lead programmer on a project simply to say, "We'll use A* for
pathfinding." However, AI programmers have found again and again that the basic A* algorithm
can be woefully inadequate for achieving the kind of realistic movement they require in their
games.
This article focuses on several techniques for achieving more realistic looking results from
pathfinding. Many of the techniques discussed here were used in the development of
Activision's upcoming Big Game Hunter 5, which made for startlingly more realistic and visually
interesting movement for the various animals in the game. The focal topics presented here
include:
Dealing with realistic turns is an important and timely AI topic. In the August 2000 issue of
Game Developer ("The Future of Game AI"), author Dave Pottinger states, "So far, no one has
proffered a simple solution for pathing in true 3D while taking into account such things as turn
radius and other movement restrictions," and goes on to describe some of the "fakes" that are
commonly done. Also, in a recent interview on Feedmag.com with Will Wright, creator of The
Sims, Wright describes movement of The Sims' characters: "They might have to turn around
and they kind of get cornered -- they actually have to calculate how quickly they can turn that
angle. Then they actually calculate the angle of displacement from step to step. Most people
don't realize how complex this stuff is..."
In addition to the above points, I will also cover some important optimization techniques, as
well as some other path-related topics such as speed restrictions, realistic people movement,
and movement along roads. After presenting the various techniques below, we'll see by the end
that there is no true "best approach," and that the method you choose will depend on the
specific nature of your game, its characters, available CPU cycles and other factors.
Note that in the world of pathfinding, the term "unit" is used to represent any on-screen mobile
element, whether it's a player character, animal, monster, ship, vehicle, infantry unit, and so
on. Note also that while the body of this article presents examples based on tile-based
searching, most of the techniques presented here are equally applicable to other types of world
division, such as convex polygons and 3D navigation meshes.
A Brief Introduction to A*
The A* algorithm is a venerable technique which was originally applied to various mathematical
problems and was adapted to pathfinding during the early years of artificial intelligence
research. The basic algorithm, when applied to a grid-based pathfinding problem, is as follows:
Start at the initial position (node) and place it on the Open list, along with its estimated cost to
the destination, which is determined by a heuristic. The heuristic is often just the geometric
distance between two nodes. Then perform the following loop while the Open list is nonempty:
z Pop the node off the Open list that has the lowest estimated cost to the destination.
z If the node is the destination, we've successfully finished (quit).
z Examine the node's eight neighboring nodes.
z For each of the nodes which are not blocked, calculate the estimated cost to the goal of
the path that goes through that node. (This is the actual cost to reach that node from the
origin, plus the heuristic cost to the destination.)
z Push all those nonblocked surrounding nodes onto the Open list, and repeat loop.
In the end, the nodes along the chosen path, including the starting and ending position, are
called the waypoints. The A* algorithm is guaranteed to find the best path from the origin to
the destination, if one exists. A more detailed introduction to A* is presented in Bryan Stout's
Game Developer article "Smart Moves: Intelligent Pathfinding" (October/November 1996),
which is also available on Gamasutra.com.
Hierarchical Pathfinding
Critical to any discussion of efficient pathfinding within a game is the notion of hierarchical
maps. To perform an efficient A* search, it is important that the origin and destination nodes of
any particular search are not too far apart, or the search time will become enormous. I
recommend that the distance between origin and destination be constrained to 40 tiles, and
that the total search space be no more than 60x60 tiles (creating a 10-tile-wide buffer behind
both origin and destination, allowing the path to wrap around large obstacles.) If units need to
search for more distant destinations, some method of hierarchical pathfinding should be used.
In the real world, people do not formulate precise path plans which stretch on for miles. Rather,
if a person has some domain knowledge of the intermediate terrain, they will subdivide the
path, i.e. "first get to the highway on-ramp, then travel to the exit for the mall, then drive to
the parking lot." Alternatively, if a person has no domain knowledge, they will create
intermediate points as they see them. For example, if you wanted to eventually reach some
point you knew was far to the North, you would first look North and pick a point you could see,
plan a path toward it, and only when you got there, you would pick your next point.
Within a game program, the techniques for creating a map hierarchy include:
1. Subdivide the line to the destination into midpoints, each of which is then used as a
subdestination. Unfortunately, this always leaves the possibility that a chosen midpoint
will be at an impossible location, which can eliminate the ability to find a valid path (see
the "Path Failure" section later in this article).
2. Preprocess the map into a large number of regions, for example castles, clearings, hills,
and so on. This can be done by an artist/designer, or even automated if maps are
random. Then start by finding a path on the "region map" to get from the current position
to the destination region, and then find a tile-based path on the detailed map to get to
the next region. Alternatively, if a unit has no region knowledge and you want to be
completely realistic with its behavior, it can just choose the next region which lies in the
compass direction of its ultimate destination. (Though again, this can result in path
failure.)
Before proceeding with turning and smoothing modifications to the A* algorithm, let's start with
some basic optimizations that can speed up the standard A* algorithm by a factor of 40 or
more. To start with, the standard A* algorithm uses a sorted linked list (or two linked lists) to
track nodes that are checked. Instead, we'll use a 60x60 fixed matrix. When starting a search
from point a to point b, we find the midpoint between those two and place it at point [30,30] on
our matrix. Each point on the matrix stores:
We also keep a separate array of 1-bit Booleans, which store whether or not each node in our
matrix has been touched yet during this search. That way, we can very rapidly initialize at the
beginning of the search without needing to clear the entire matrix.
Whereas the original algorithm maintains a separate sorted Open list (actually a Priority
Queue), we instead maintain basic list functionality simply by using Previous and Next pointers
within the fixed array. Note that we do have the memory requirement for our 60x60 matrix, but
our compacted data structure requires only 16 bytes per node, for a total of 57K. (Even
expanding the matrix to 120x120 will only require 230K of memory.)
Note additionally that the "list" can be implemented as a binary tree (by having two Next node
pointers at each element), but we've actually found it to be substantially faster to have a simple
(non-priority) list. While this does result in time O(n) for the search for the lowest cost node at
the top of the A* loop (rather than O(log n) for a priority queue), it excels in that all insertions
and deletions, of which there are many, are only O(1). Best of all, it eliminates the inner loop
search that checks if neighboring nodes yet exist on the Open or Closed lists, which otherwise
would take O(n) (or maybe a bit better if a hash table is used).
Overall, by avoiding all memory allocations and list insertions, this method turns out to be
dramatically faster. I have profiled it to be as much as 40 times faster than standard A*
implementations.
Note that for the Directional search described later in this article, eight times the number of
nodes are necessary, so the memory requirement will all increase by a factor of eight.
The first and most basic step in making an A* path more realistic is getting rid of the zigzag
effect it produces, which you can see in Figure 2a. This effect is caused by the fact that the
standard A* algorithm searches the eight tiles surrounding a tile, and then proceeds to the next
tile. This is fine in primitive games where units simply hop from tile to tile, but is unacceptable
for the smooth movement required in most games today.
FIGURE 2. The common zigzag effect of
the standard A* algorithm (a); a
modification with fewer, but still fairly
dramatic, turns (b); and the most direct --
and hence desired -- route (c). To achieve
the path shown in Figure 2c, the four
waypoints shown in red in Figure 2a were
eliminated.
One simple method of reducing the number of turns is to make the following modification to the
A* algorithm: Add a cost penalty each time a turn is taken. This will favor paths which are the
same distance, but take fewer turns, as shown in Figure 2b. Unfortunately, this simplistic
solution is not very effective, because all turns are still at 45-degree angles, which causes the
movement to continue to look rather unrealistic. In addition, the 45-degree-angle turns often
cause paths to be much longer than they have to be. Finally, this solution may add significantly
to the time required to perform the A* algorithm.
The actual desired path is that shown in Figure 2c, which takes the most direct route,
regardless of the angle. In order to achieve this effect, we introduce a simple smoothing
algorithm which takes place after the standard A* algorithm has completed its path. The
algorithm makes use of a function Walkable(pointA, pointB), which samples points along a
line from point A to point B at a certain granularity (typically we use one-fifth of a tile width),
checking at each point whether the unit overlaps any neighboring blocked tile. (Using the width
of the unit, it checks the four points in a diamond pattern around the unit's center.) The
function returns true if it encounters no blocked tiles and false otherwise. See Figure 3 for an
illustration, and Listing 1 for pseudocode.
LISTING 1. Pseudocode for the simple smoothing algorithm. The smoothing algorithm simply
checks from waypoint to waypoint along the path, trying to eliminate intermediate waypoints
when possible.
The smoothing algorithm simply checks from waypoint to waypoint along the path, trying to
eliminate intermediate waypoints when possible. To achieve the path shown in Figure 2c, the
four waypoints shown in red in Figure 2a were eliminated.
Since the standard A* algorithm searches the surrounding eight tiles at every node, there are
times when it returns a path which is impossible, as shown with the green path in Figure 4. In
these cases, the smoothing algorithm presented above will smooth the portions it can (shown in
purple), and leave the "impossible" sections as is.
This simple smoothing algorithm is similar to "line of sight" smoothing, in which all waypoints
are progressively skipped until the last one that can be "seen" from the current position.
However, the algorithm presented here is more accurate, because it adds collision detection
based on the width of the character and also can be used easily in conjunction with the realistic
turning methods described in the next section.
FIGURE 4. This smoothing algorithm will
leave impossible paths alone.
Note that the simple smoothing algorithm presented above, like other simple smoothing
methods, is less effective with large units and with certain configurations of blocking objects. A
more sophisticated smoothing pass will be presented later.
The next step is to add realistic curved turns for our units, so that they don't appear to change
direction abruptly every time they need to turn. A simple solution involves using a spline to
smooth the abrupt corners into turns. While this solves some of the aesthetic concerns, it still
results in physically very unrealistic movement for most units. For example, it might change an
abrupt cornering of a tank into a tight curve, but the curved turn would still be much tighter
than the tank could actually perform.
For a better solution, the first thing we need to know is the turning radius for our unit. Turning
radius is a fairly simple concept: if you're in a big parking lot in your car, and turn the wheel to
the left as far as it will go and proceed to drive in a circle, the radius of that circle is your
turning radius. The turning radius of a Volkswagen Beetle will be substantially smaller than that
of a big SUV, and the turning radius of a person will be substantially less than that of a large,
lumbering bear.
Let's say you're at some point (origin) and pointed in a certain direction, and you need to get to
some other point (destination), as illustrated in Figure 5. The shortest path is found either by
turning left as far as you can, going in a circle until you are directly pointed at the destination,
and then proceeding forward, or by turning right and doing the same thing.
In Figure 5 the shortest route is clearly the green line at the bottom. This path turns out to be
fairly straightforward to calculate due to some geometric relationships, illustrated in Figure 6.
First we calculate the location of point P, which is the center of our turning circle, and is always
radius r away from the starting point. If we are turning right from our initial direction, that
means P is at an angle of (initial_direction - 90) from the origin, so:
angleToP = initial_direction - 90
P.x = Origin.x + r * cos(angleToP)
P.y = Origin.y + r * sin(angleToP)
Now that we know the location of the center point P, we can calculate the distance from P to the
destination, shown as h on the diagram:
dx = Destination.x - P.x
dy = Destination.y - P.y
h = sqrt(dx*dx + dy*dy)
At this point we also want to check that the destination is not within the circle, because if it
were, we could never reach it:
if (h < r)
return false
Now we can calculate the length of segment d, since we already know the lengths of the other
two sides of the right triangle, namely h and r. We can also determine angle from the right-
triangle relationship:
d = sqrt(h*h - r*r)
theta = arccos(r / h)
Finally, to figure out the point Q at which to leave the circle and start on the straight line, we
need to know the total angle + , and is easily determined as the angle from P to the
destination:
In our implementation of this algorithm and the ones that follow, we utilize a data structure
which stores up to four distinct "line segments," each one being either straight or curved. For
the curved paths described here, there are only two segments used: an arc followed by a
straight line. The data structure contains members which specify whether the segment is an arc
or a straight line, the length of the segment, and its starting position. If the segment is a
straight line, the data structure also specifies the angle; for arcs, it specifies the center of the
circle, the starting angle on the circle, and the total radians covered by the arc.
Once we have calculated the curved path necessary to get between two points, we can easily
calculate our position and direction at any given instant in time, as shown in Listing 2.
So now that we know how to find and follow an efficient curved line between two points, how do
we use this in our pathing? The methods discussed in this section are all postprocessing
techniques. In other words, they involve using the standard A* algorithm during initial
pathfinding, and then adding curved turns later in some fashion, either in an extended
pathfinding or during actual unit movement.
FIGURE 7. Decreasing the turning radius (a), and
making a three-point turn (b).
1. Simple solution: ignoring blocked tiles. We start with the simplest solution. First use
the A* algorithm to calculate the path. Then progress from point to point in the path as
follows: At any waypoint, a unit has a position, an orientation, and a destination
waypoint. Using the algorithm described in the preceding section, we can calculate the
fastest curved path to get from the current waypoint to the next waypoint. We don't care
what direction we are facing when we reach the destination waypoint, though that will
turn out to be the starting orientation for the following waypoint. If we skim some
obstacles along the way, so be it -- this is a fast approximation, and we are willing to
overlook such things. Figure 1c shows the result of this method. The curves are nice, but
on both turns, the side of the ship will overlap a blocking tile.
This solution is actually quite acceptable for many games. However, we often don't want to
allow any obviously illegal turns where the unit overlaps obstacles. The next three methods
address this problem.
2. Path recalculations. With this method, after the A* has completed, we step through the
path, making sure every move from one waypoint to the next is valid. (This can be done
as part of a smoothing pass.) If we find a collision, we mark the move as invalid and try
the A* path search again. In order to do this, we need to store one byte for every tile (or
add an additional byte to the matrix elements described in the optimization section
above). Each bit will correspond to one of the eight tiles accessible from that tile. Then we
modify the A* algorithm slightly so that it checks whether a particular move is valid
before allowing it. The main problem with this method is that by invalidating certain
moves, a valid path approaching the tile from a different direction can be left unfound.
Also, in a worst-case scenario, this method could need to recalculate the path many times
over.
3. Making tighter turns. Another solution is that whenever we need to make a turn that
would normally cause a collision, we allow our turning radius to decrease until the turn
becomes legal. This is illustrated with the first turn in Figure 7a. One proviso is that when
we conduct the A* search, we need to search only the surrounding four tiles at every
node (as opposed to eight), so we don't end up with impossible situations like the one
illustrated in Figure 4. In the case of vehicles, this method may look odd, whereby some
lumbering tank suddenly makes an unbelievably tight turn. However, in other cases this
may be exactly what you want. Unlike vehicles, which tend to have a constant turning
radius, if your units are people, they are able to turn much more tightly if they are
creeping along than if they are running. So in order to follow the simple path, you simply
need to decelerate the unit as it approaches the turn. This can yield very realistic
movement. (See the sections on "Speed and People Movement" for a further discussion.)
4. Backing up. Our final solution comes from real-world experience. How do we make a
very tight turn into a driveway? We back up and make a three-point turn, of course, as
illustrated in Figure 7b. If your units are able to perform such maneuvers, and if this is
consistent with their behavior, this is a very viable solution.
Legal Turns: The Directional A* Algorithm
None of the methods presented in the above section is formally correct. Method two can often
fail to find valid paths, and methods one, three, and four are all basically cheats. Comparing
Figures 1c and 1d, we see that the only valid solution which takes turning radius into account
may require a completely different route from what the basic A* algorithm provides. To solve
this problem, I'll introduce a significant modification to the algorithm, which I'll term the
Directional A*.
The main change to the algorithm is the addition of a third dimension. Instead of having a flat
grid of nodes, where each node represents an XY grid position, we now have a three-
dimensional space of nodes, where a node <X,Y,orientation> represents the position at that
node, as well as the compass orientation of the unit (N, S, E, W, NE, NW, SE, SW.) For
example, a node might be [X = 92, Y = 142, orientation = NW]. Thus there are eight times as
many nodes as before. There are also 64 times as many ways of getting from one <X,Y>
location to another, because you can start at the first node pointing any one of eight directions,
and end at the next node pointing any one of eight directions.
During the algorithm, when we're at a parent node p and checking a child node q, we don't just
check if the child itself is a blocked tile. We check if a curved path from p to q is possible
(taking into account the orientation at p, the orientation at q, and the turning radius); and if so,
we check if traveling on that path would hit any blocked tiles. Only then do we consider a child
node to be valid. In this fashion, every path we look at will be legal, and we will end up with a
valid path given the size and turning radius of the unit. Figure 8 illustrates this.
The shortest path, and the one that would be chosen by the standard A* algorithm, goes from a
to c. However, the turning radius of the unit prevents the unit from performing the right turn at
c given the surrounding blockers, and thus the standard A* would return an invalid path in this
case. The Directional A*, on the other hand, sees this and instead looks at the alternate path
through b. Yet even at b, a 90 degrees turn to the left is not possible due to nearby blockers, so
the algorithm finds that it can make a right-hand loop and then continue.
Directional Curved Paths
In order to implement the Directional A* algorithm, it is necessary to figure out how to compute
the shortest path from a point p to a point q, taking into account not only starting direction,
orientation, and turning radius, but also the ending direction. This algorithm will allow us to
compute the shortest legal method of getting from a current position and orientation on the
map to the next waypoint, and also to be facing a certain direction upon arriving there.
Earlier we saw how to compute the shortest path given just a starting orientation and turning
radius. Adding a fixed final orientation makes the process a bit more challenging.
There are four possible shortest paths for getting from origin to destination with fixed starting
and ending directions. This is illustrated in Figure 9. The main difference between this and
Figure 5 is that we approach the destination point by going around an arc of a circle, so that we
will end up pointing in the correct direction. Similar to before, we will use trigonometric
relationships to figure out the angles and lengths for each segment, except that there are now
three segments in total: the first arc, the line in the middle, and the second arc.
We can easily position the turning circles for both origin and destination in the same way that
we did earlier for Figure 6. The challenge is finding the point (and angle) where the path leaves
the first circle, and later where it hits the second circle. There are two main cases that we need
to consider. First, there is the case where we are traveling around both circles in the same
direction, for example clockwise and clockwise (see Figure 10).
FIGURE 10. Case 1: Traveling around both
circles in the same direction.
1. The line from P1 to P2 has the same length and slope as the (green) path line below it.
2. The arc angle at which the line touches the first circle is simply 90 degrees different from
the slope of the line.
3. The arc angle at which the line touches the second circle is exactly the same as the arc
angle at which it touches the first circle.
The second case, where the path travels around the circles in opposite directions (for example,
clockwise around the first and counterclockwise around the second), is somewhat more
complicated (see Figure 11). To solve this problem, we imagine a third circle centered at P3
which is tangent to the destination circle, and whose angle relative to the destination circle is at
right angles with the (green) path line. Now we follow these steps:
1. Observe that we can draw a right triangle between P1, P2, and P3.
2. We know that the length from P2 to P3 is (2 * radius), and we already know the length
from P1 to P2, so we can calculate the angle as = arccos(2 * radius / Length(P1,
P2))
3. Since we also already know the angle of the line from P1 to P2, we just add or subtract
(depending on clockwise or counterclockwise turning) to get the exact angle of the
(green) path line. From that we can calculate the arc angle where it leaves the first circle
and the arc angle where it touches the second circle.
We now know how to determine all four paths from origin to destination, so given two nodes
(and their associated positions and directions), we can calculate the four possible paths and use
the one which is the shortest.
Note that we can now use the simple smoothing algorithm presented earlier with curved paths,
with just a slight modification to the Walkable(pointA, pointB) function. Instead of point-
sampling in a straight line between pointA and pointB, the new Walkable(pointA,
directionA, pointB, directionB) function samples intermediate points along a valid curve
between A and B given the initial and final directions.
Discrete and nondiscrete positions and directions. Some readers may be concerned at this
point, since it seems that our algorithm is dependent on movement always starting at exactly
the center position of a tile, and at exactly one of eight compass directions. In real games, a
character may be in the middle of walking between two tiles at the exact moment we need it to
change direction. In fact, we can easily modify the algorithm so that whenever the origin node
is the starting point of the search, we do the curve computations based on the true precise
position and angle of the character's starting point. This eliminates the restriction.
Nonetheless, the algorithm still requires that the waypoints are at the center of tiles and at
exact compass directions. These restrictions can seemingly cause problems where a valid path
may not be found. The case of tile-centering is discussed in more detail below. The problem of
rounded compass directions, however, is in fact very minimal and will almost never restrict a
valid path. It may cause visible turns to be a bit more exaggerated, but this effect is very slight.
Expanded searching to surrounding tiles. So far in this discussion, we have assumed that
at every node, you check the surrounding eight locations as neighbors. We call this a
Directional-8 search. As mentioned in the preceding paragraph, there are times when this is
restrictive. For example, the search shown in Figure 12 will fail for a Directional-8 search,
because given a wide turning radius for the ship, it would impossible to traverse a -> b -> c ->
d without hitting blocking tiles. Instead, it is necessary to find a curve directly from a -> d.
Accomplishing this requires searching not just the surrounding eight tiles, which are one tile
away, but the surrounding 24 tiles, which are two tiles away. We call this a Directional-24
search, and it was such a search that produced the valid path shown in Figure 12. We can even
search three tiles away for a Directional-48 search. The main problem with these extended
searches is computation time. A node in a Directional-8 search has 8 x 8 = 64 child nodes, but
a node in a Directional-24 search has 24 x 8 = 192 child nodes.
A small optimization we can do is to set up a directional table to tell us the relative position of a
child given a simple index. For example, in a Directional-48 search, we loop through directions
0 -> 47, and a sample table entry would be:
DirTable[47] = <-3,+3>.
The modified heuristic. Our modified A* algorithm will also need a modified heuristic (to
estimate the cost from an intermediate node to the goal). The original A* heuristic typically just
measures a straight-line distance from the current position to the goal position. If we used this,
we would end up equally weighing every compass direction at a given location, which would
make the search take substantially longer in most cases. Instead, we want to favor angles that
point toward the goal, while also taking turning radius into account. To do this, we change the
heuristic to be the distance of the shortest curve from the current location and angle to the
destination location, as calculated in the "Adding Realistic Turns" section earlier.
To avoid making this calculation each time, we set up a heuristic table in advance. The table
contains heuristic values for any destination tile within a 10-tile distance (with a granularity of
1/64th tile), and at any angle relative to the current direction (with a granularity of eight
angles.) Any destination tile beyond 10 tiles is computed with the 10-tile value, plus the
difference in actual distance, which turns out to be a very close approximation. The total data
size of the table is thus 640 (distance) x 8 (directions) x 4 (bytes) = 20K. Since the table is
dependent on the turn radius of the unit, if that turn radius changes, we need to recalculate the
table.
Using a hit-check table. The trigonometric calculations described above to determine the
path from one node to another are not trivial and take some computational time. Pair this with
the requirement of "walking" through the resultant path to see if any tiles are hit, and the fact
that this whole process needs to be performed at every possible node combination in the
search. The result is a total computation time that would be absurdly long. Instead, we use a
special table which substantially reduces the computation time. For any given starting direction
(8 total), ending direction (8 total), and ending position (up to 48, for a Directional-48 search),
the table stores a 121-bit value. This value represents an 11x11 grid surrounding the origin, as
seen in Figure 13.
Any tiles that would be touched by a unit traveling between those nodes (other than the origin
and destination tiles themselves) will be marked by a "1" in the appropriate bit-field, while all
others will be "0." Then during the search algorithm itself, the table will simply be accessed,
and any marked nodes will result in a check to see if the associated node in the real map is
blocked or not. (A blocked node would then result in report of failure to travel between those
nodes.) Note that the table is dependent on both the size and turn radius of the unit, so if those
values change, the table will need to be recomputed.
Earlier, I mentioned how the very first position in a path may not be at the precise center
location of a tile or at a precise compass direction. As a result, if we happen to be specifically
checking neighbors of that first tile, the algorithm needs to do a full computation to determine
the path, since the table would not be accurate.
Other options: backing up. Finally, if your units are able to move in reverse, this can easily
be incorporated into the Directional A* algorithm to allow further flexibility in finding a suitable
path. In addition to the eight forward directions, simply add an additional eight reverse
directions. The algorithm will automatically utilize reverse in its path search. Typically units
shouldn't be traveling in reverse half the time though, so you can also add a penalty to the
distance and heuristic computations for traveling in reverse, which will "encourage" units to go
in reverse only when necessary.
Correctness of Directional A*
The standard A* algorithm, if used in a strict tile-based world with no turning restrictions, and
in a world where a unit must always be at the center of a tile, is guaranteed to find a solution if
one exists. The Directional A* algorithm on the other hand, when used in a more realistic world
with turning restrictions and nondiscrete positions, is not absolutely guaranteed to find a
solution. There are a couple reasons for this.
Earlier we saw how the Directional-8 algorithm could occasionally miss a valid path, and this
was illustrated in Figure 12. The conclusion was to use a Directional-24 search or even a
Directional-48 search. However, in very rare circumstances, the same problem could occur with
a Directional-48 search. We could extend even further to a Directional-80 search, but at that
point the computation time required would probably be too high.
The other problem is that the shortest legal curved path between two points, which we compute
in our algorithm, is not the only legal curved path. For one thing, there are the four possible
paths shown in Figure 9. Our algorithm simply picks the shortest and assumes that is the
correct one. Yet possibly that one path may fail, while one of the other three may have
succeeded. (Though when I tried to fabricate such a condition, it proved almost impossible.
Another route was always found by the algorithm.) Furthermore, the four paths shown in that
figure are not the only legal paths, either. There are theoretically an infinite number of paths
that twist and turn in many different ways.
In practice, though, it is very rare for the Directional-24 search to fail to find a valid path. And it
is almost impossible for the Directional-48 search to fail.
Until now, the discussion has assumed that units can face any direction: 27 degrees, 29
degrees, 133 degrees, and so on. However, certain games which do not use real-time 3D art do
not have this flexibility. A unit may be prerendered in only eight or 16 different angles.
Fortunately, we can deal with this without too much trouble. In fact, the accompanying test
program includes an option of 16-angle fixed art, to illustrate the process.
The trivial way of dealing with the problem is to do all of the calculations assuming continuous
angles, and then when rendering to the screen, simply round to the nearest legal direction (for
example, a multiple of 22.5 degrees for 16-angle art) and draw the appropriate frame.
Unfortunately, this usually results in a visible "sliding" effect for the unit, which typically is
unacceptable.
FIGURE 14. Turning with fixed-angle
character art.
What we really want is a solution which can modify a continuous path, like the one shown at the
bottom of Figure 14, and create a very similar path using just discrete lines, as shown in the
top of the figure.
The solution involves two steps. First, for all arcs in the path, follow the original circle as closely
as possible, but staying just outside of it. Second, for straight lines in the path, create the
closest approximation using two line segments of legal angles. These are both illustrated in
Figure 15. For the purposes of studying the figure, we have allowed only eight legal directions,
though this can easily be extended to 16.
On the left, we see that we can divide the circle into 16 equivalent right triangles. Going one
direction on the outside of the circle (for example, northeast) involves traversing the bases of
two of these triangles. Each triangle has one leg which has the length of the radius, and the
inside angle is simply /8 or 22.5 degrees. Thus the base of each triangle is simply:
base = r * tan(22.5)
We can then extrapolate any point on the original arc (for example, 51.2 degrees) onto the
point where it hits one of the triangles we have just identified. Knowing these relationships, we
can then calculate (with some additional work) the starting and ending point on the modified
arc, and the total distance in between.
FIGURE 15. Geometry of fixed-angle turning.
For a straight line, we simply find the two lines of the closest legal angles, for example 0
degrees and 45 degrees, and determine where they touch. As shown in the figure, there are
actually two such routes (one above the original line and another below), but in our sample
program we always just pick one. Using basic slope and intercept relationships, we simply
calculate intersection of the two lines to determine where to change direction.
Note that we still use the "line segment" storage method introduced earlier to store the
modified path for fixed character art. In fact, this is why I said earlier we would need up to four
line segments. The starting and ending arc remain one line segment each (and we determine
the precise position of the unit on the modified "arc" while it is actually moving), but the initial
straight line segment between the two arcs now becomes two distinct straight line segments.
The approach we have taken thus far is to find the shortest curve possible between any two
points in a path. So if a unit is headed due east to a point p (and thus is pointing east when it
hits p), and then needs to go due north for five tiles to hit a point q, the unit will first need to
turn left for approximately 105 degrees of its turning circle, and then head at an approximate
direction of north-northwest until it arrives at point q. Note that we could alternatively have
defined the path to turn substantially further around the circle and then travel due north, but
that would have been a longer path. See the vertical path portions of Figure 16 for an
illustration.
A B
At certain times, even though it is longer, the path in Figure 16b may be what is desired. This
most often occurs when units are supposed to be traveling on roads. It simply is not realistic for
a vehicle to drive diagonally across a road just to save a few feet in total distance.
1. When on roads, make sure to do only a regular A* search or a Directional-8 search, and
do not apply any smoothing algorithm afterwards. This will force the unit to go to the
adjacent tile. However, this will only work if the turning radius is small enough to allow
such a tight turn. Otherwise, the algorithm will find an adjacent tile which is off the road.
2. Temporarily disallow movement to any off-road tile. This has the same constraints as the
above method.
3. Same as (1), but for units that have too wide a turning radius to turn into an adjacent
road tile, do a Directional-24 or Directional-48 search as appropriate. For example, the
unit shown in Figure 16b apparently requires two tiles to make a 90-degree turn, so a
Directional-24 search would be appropriate.
4. Determine the number of tiles needed for a turn (for example two tiles, as in the figure),
and temporarily place blocking tiles adjacent to the road after that number of tiles has
gone by, beyond every turn. These temporary blockers are in fact displayed in Figure 16b.
This method is analogous to placing "cones" by the road.
The smoothing algorithm given earlier is less than ideal when used by itself. There are two
reasons for this. Figure 17 demonstrates the first problem. The algorithm stops at point q and
looks ahead to see how many nodes it can skip while still conducting a legal move. It makes it
to point r, but fails to allow a move from q to s because of the blocker near q. Therefore it
simply starts again at r and skips to the destination. What we'd really like to see is a change of
direction at p, which cuts diagonally to the final destination, as shown with the dashed line.
The second problem exhibits itself only when we have created a path using the simple (non-
directional) method, and is demonstrated by the green line in Figure 18. The algorithm moves
forward linearly, keeping the direction of the ship pointing straight up, and stops at point p.
Looking ahead to the next point (q), it sees that the turning radius makes the turn impossible.
The smoothing algorithm then proceeds to "cheat" and simply allow the turn. However, had it
approached p from a diagonal, it could have made the turn legally as evidenced by the blue
line.
To fix these problems, we introduce a new pre-smoothing pass that will be executed after the
A* search process, but prior to the simple smoothing algorithm described earlier. This pass is
actually a very fast version of our Directional-48 algorithm, with the difference that we only
allow nodes to move along the path we previously found in the A* search, but we consider the
neighbors of any node to be those waypoints which were one, two, or three tiles ahead in the
original path. We also modify the cost heuristic to favor the direction of the original path (as
opposed to the direction toward the goal). The algorithm will automatically search through
various orientations at each waypoint, and various combinations of hopping in two- or three-tile
steps, to find the best way to reach the goal.
FIGURE 18. Another shortcoming: the
simple smoothing algorithm is unable to
find and execute a turn within the legal
turning radius.
Because this algorithm sticks to tiles along the previous path, it runs fairly quickly, while also
allowing us to gain many of the benefits of a Directional-48 search. For example, it will find the
legal blue line path shown in Figure 18. Of course it is not perfect, as it still will not find paths
that are only visible to a full Directional-48 search, as seen in Figure 19.
The original, nondirectional search finds the green path, which executes illegal turns. There are
no legal ways to perform those turns while still staying on the path. The only way to arrive at
the destination legally is via a completely different path, as shown with the blue line. This pre-
smoothing algorithm cannot find that path: it can only be found using a true Directional search,
or by one of the hybrid methods described later. So the pre-smoothing algorithm fails under
this condition. Under such a failure condition, and especially when the illegal move occurs near
the destination, the pre-smoothing algorithm may require far more computation time than we
desire, because it will search back through every combination of directional nodes along the
entire path. To help alleviate this and improve performance, we add an additional feature such
that once the pre-smoothing algorithm has reached any point p along the path, if it ever
searches back to a point that is six or more points prior to p in the path, it will fail
automatically.
FIGURE 19. The blue line shows the only
truly legal path, which the pre-smoothing
algorithm can't find, but the Directional
search can.
Depending on the pathfinding method chosen, it is possible for failure to be reported either
when there is truly no possible path, or when the chosen solution simply has not found the path
(which is more likely to occur when utilizing fast, informal methods.) What to do in the case of
failure is entirely dependent on the specifics of the game. Typically, it simply means that the
current goal -- a food source, ammunition depot, or enemy base -- is not attainable from the
current position, and the unit must choose a different goal. However, it is possible that in
certain circumstances we know the goal is achievable, and it is important to find it for
gameplay. In these cases we might have started with a faster search method, but if that fails,
we can proceed from scratch with a slower, methodical search, such as the standard
Directional-24.
The key problem is that A* and its derivative methods of pathfinding perform very poorly if
pathfinding fails. To alleviate this problem, it is important to minimize failures. This can be done
by dividing up the map into small regions in advance (let's say 1,000 total), and precomputing
whether it's possible, given two regions ra and rb, to get from some tile in ra to some tile in rb.
We need only one bit to store this, so in our example this uses a 128K table. Then before
executing any pathfind, we first check the table. If travel between the regions is impossible, we
immediately report failure. Otherwise, it is probably possible to get from our specific source tile
to our specific destination tile, so we proceed with the pathfinding algorithm.
In the next section, I'll discuss the time performance for the different algorithms presented
here, and how to mix and match techniques to achieve faster times. Note that it is possible to
timeslice the search algorithm so that it can be interrupted and restarted several times, and
thereby take place over a number of frames, in order to minimize overall performance
degradation due to a slow search. The optimized algorithm presented earlier uses a fixed matrix
to keep track of the intermediate results of a search, so unfortunately this means that any other
units requiring a pathfinding search during that time will be "starved" until the previous pathing
is complete. To help alleviate this, we can instead allocate two matrices, one for the occasional
slow search that takes several timesliced frames and the other for interspersed fast searches.
Only the slow searches, then, will need to pause (very briefly) to complete. In fact, it is actually
quite reasonable that a unit "stop and think" for a moment to figure out a particularly difficult
path.
In this section, I'll discuss performance of many of the techniques presented earlier and how to
utilize that knowledge to choose the best algorithms for a particular application.
The major observation testing the various techniques presented here was that the performance
of the Directional algorithm is slow, probably too slow for many applications. Perhaps most
units in a particular game can utilize the simple A* algorithm with smoothing passes, while a
particular few large units could utilize a Directional algorithm. (It is nice to note, however, that
only a few years ago, system performance would have prohibited implementation of anything
but the simplest A* algorithm, and perhaps a few years from now, the performance issues
discussed here will not be significant.)
Note that the circuitous route required for the path in Figure
20 is somewhat complex, and the search required around
200ms when tested. Figure 21a shows a simpler version
that only required 85ms. Finally, by simply adding one
blocking tile, as shown in Figure 21b, the time is reduced to
under 6ms, because the initial A* search was able to find a
valid route and was not "misled" into an impossible section
of terrain.
The formal way to attack this problem would be to add yet another dimension to the search
space. For the Directional algorithm, we added current direction as a third dimension. We could
theoretically add current speed as a fourth dimension (rounding to a total of eight or 10
approximate speeds.) When moving from one node to another, we would have to check
whether the increase or decrease in speed would be possible given the vehicle's acceleration or
braking capability, and any turning which is in progress. Of course, this increases the search
space dramatically and will hurt performance quite a bit.
The simplest way of incorporating speed as a factor, though not the most precise, is simply to
modify the costs in the Directional search so that any turns are "charged" extra. This will
penalize turning, due to the reductions in speed necessary to make turns. Unfortunately, this is
only effective in Directional-24 or Directional-48 searches. A Directional-8 search yields lots of
extraneous turns which are later dealt with by the smoothing pass, but since the penalties
proposed here would occur during the main search phase, the accuracy could suffer quite a bit.
Fluid turning radius. The tables used to optimize the Directional algorithms, discussed earlier,
are based on a fixed turning radius and unit size. In the sample program provided, if the
turning radius or unit size is changed, the tables are recalculated. However, as mentioned
earlier in the "Basic Methods" section, some units may have a more fluid turning radius that
depends on their speed or other factors. This is especially true of "people" units, which can
easily slow down to make a tighter turn.
Resolving a path under these circumstances can become increasingly complex. In addition to
the requirement for much more memory for tables (covering a range of turning radii), a formal
search algorithm would in fact need to track an additional speed dimension, and factor
acceleration into account when determining on-the-fly turning ability and resultant speed.
Friendly-collision avoidance. Units which are friendly to one another typically need some
method of avoiding collisions and continuing toward a goal destination. One effective method is
as follows: Every half-second or so, make a quick map of which tiles each unit would hit over
the next two seconds if they continued on their current course. Each unit then "looks" to see
whether it will collide with any other unit. If so, it immediately begins decelerating, and plans a
new route that avoids the problem tile. (It can start accelerating again once the paths no longer
cross.) Ideally, all units will favor movement to the right side, so that units facing each other
won't keep hopping back to the left and right (as we often do in life). Still, units may come
close to colliding and need to be smart enough to stop, yield to the right, back up a step if
there's not enough room to pass, and so on.
Final Notes
This article has made some simplifying assumptions to help describe the search methods
presented. First, all searches shown have been in 2D space. Most games still use 2D searches,
since the third dimension is often inaccessible to characters, or may be a slight variation (such
as jumping) that would not affect the search. All examples used here have also utilized simple
grid partitioning, though many games use more sophisticated 2D world partitioning such as
quadtrees or convex polygons. Some games definitely do require a true search of 3D space.
This can be accomplished in a fairly straightforward manner by adding height as another
dimension to the search, though that typically makes the search space grow impossibly large.
More efficient 3D world partitioning techniques exist, such as navigation meshes. Regardless of
the partitioning method used, though, the pathfinding and smoothing techniques presented
here can be applied with some minor modifications.
The algorithms presented in this article are only partially optimized. They can potentially be
sped up further through various techniques. There is the possibility of more and better use of
tables, perhaps even eliminating trigonometric functions and replacing them with lookups. Also,
the majority of time spent in the Directional algorithm is in the inner loop which checks for
blocking tiles which may have been hit. An optimization of that section of code could potentially
double the performance. Finally, the heuristics used in the Directional algorithm and the
Smoothing-48 pass could potentially be revised to find solutions substantially faster, or at least
tweaked for specific games.
Pathfinding is a complex problem which requires further study and refinement. Clearly not all
questions are adequately resolved. One critical issue at the moment is performance. I am
confident that some readers will find faster implementations of the techniques presented here,
and probably faster techniques as well. I look forward to this growth in the field.
Web sites
http://www.gamasutra.com/features/200103014/pinter_01.htm
This paper first discusses player control and navigation in an action oriented 3D game. The
collision requirements mentioned help motivate the following sections on collision detection. The
final sections discuss application and construction issues, as well as ideas for future work.
In a video game there are many small projectiles, such as rocks or bullets, that travel along a
given trajectory. You would expect one of these to hit a character only if it impacts a polygon
on that character. This makes sense because it simulates how things work in real life.
So how should we handle the separate problem of collision and navigating of a character in a
3D environment? In the real world this is achieved by "left foot forward", "right foot forward",
and so on. In a video game the user has an avatar which is often a biped, so why not
implement something analogous to how things work in the real world? The big problem here is
that our interface is restricted to a 2DOF mouse and a few extra bits from the available mouse
buttons and keys. There have been various attempts, but its very hard to make an effective
interface that perfectly mirrors how we do things in the real world - such as picking up objects,
walking with our feet, and so on. Olympic Decathlon on the Apple 2 is one of the few successful
examples. Rather than trying to physically simulate all the limbs of the body, most of the
popular shooters today seem to navigate the character as if it was a simple shape - such as a
sphere or cube.
If the game is 3rd person, or if you are looking at another player, you still expect to see the
model move its feet when walking, or play a jump-animation when the player jumps into the
air. The animation of the character is a function of what's going on in our simple navigation
model - not the other way around. In MDK2 each character had a set of animations for running
and walking forward, backward, sideways, jumping, climbing and so on. These were invoked at
the appropriate times. We didn't do any IK, but that would be a great improvement over just
playing "canned" animations. IK is consistent with the system presented here
Resolving Collisions
Based on user input the engine applies accelerations to the cylinder object. As the player moves
around it is important to check to see if he collides with the environment. If a collision occurs
you must deal with it by changing the location where you had planned to relocate the player.
This may require a few iterations to get right.
Sliding
Its best to solve a collision by deflecting the players motion rather than stopping at the impact
point. Imagine running down a hallway and brushing up against the wall would cause you to
"stick". This is easily solved using something in your code like:
Vt += Ni * (-dot(Ni,Vt)-Nd)
Where:
In English, this just means: move the target-position above the impact plane in the direction
perpendicular to that plane.
Its best to solve a collision by deflecting the players
motion rather than stopping at the impact point.
There is one thing to watch out for with this simple collision resolution algorithm: when you
have two planes of impact that face each other you do not want an endless cycle deflecting the
motion vector back and forth. It is best if the code can handle multiple planes simultaneously.
Stepping
When was the last time you were crossing the street and tripped on the curb?
Stepping up onto the sidewalk is a function processed by some subconscious part of the brain.
In a video game, the users do not want to have to worry about small steps and stairs. He
mouselooks where he wants to go while holding the forward arrow key. It is up to the game
developer to implement all the lower level brain functions.
In a video game, the users do not want to have to worry
aboutsmall steps and stairs.
I solved this by lifting up the path of the player when it collided with something. If the raised
path was clear I'd move the player along it and then drop it back down. I thought this was just
a temporary fix when I first implemented it during our prototype stage, but it worked
beautifully and ended up in the final code that was shipped.
In MDK2, implementing climbing turned out to be a basic extension to the step up ability.
Climbing was used if the height transition was large enough. In this case, the player's climbing
animation was also invoked. For the user, it became easy to move his character up onto a
ledge. No keypresses required. No special skills to learn.
VRML browsers are fine for grabbing and rotating an object. But they are not as well suited for
navigating around a world. There is little sense of "presence". Whether 1st or 3rd person, it is
nice to have an avatar that has some volume to it and will collide with the environment.
Furthermore, the movement shouldn't be cumbersome. People expect to be able to move
sideways, to slide along walls, and to automatically ascend stairs or ladders. To appease the
user you want to provide an intuitive interface that feels right. So even if you are building some
3D internet shopping mall application, grab the key bindings from the user's Quake config file,
and let him rocket-jump from the bank to the clothing store. It would be nice to see more
successful applications, other than games, that are based on 3D immersive technology.
Simulating an object as a point is a popular technique to reduce the complexity of various math
and physics problems. Fast collision detection is important for interactive 3D applications that
wish to maintain a high frame rate. Not surprisingly, one popular method of doing collision
detection of an object with an arbitrary polygonal environment is to approximate the object as a
point. The reason the object does not intersect the environment's geometry is because the
object does its collision detection with an approximate offset surface - an "expanded" or
"scaled" copy of the geometry where the interior walls have been moved inward, exterior walls
shifted outward, the floors raised, and the ceiling lowered. The modifications to the
environment correspond to the size of the object that collision tests will be performed. Note that
by environment we are referring to a large, detailed, 3D model that is rigid (static).
As the object moves from one position, v0, to another, v1, the motion line segment, (v0,v1), is
checked against the offset surface to determine if it has collided. Character collisions are
resolved as discussed in the previous section. Physically based objects typically have their
velocity reflected off of the plane of impact.
Note that just treating an object as a point is not a sufficient method for fast collision detection.
An arbitrary polygonal environment can contain thousands of polygons. Therefore this geometry
should be represented in an efficient spatial structure such as a binary space partitioning (BSP)
tree.
A disadvantage of this offset surface technique is that it requires an additional copy of the
environment's geometry for each object shape/size. If an object is allowed to change
orientation, then there are further symmetry restrictions on the object's collision envelope.
In an effort to make MDK2 a content-rich game, we have many different sized characters. The
memory requirement for having multiple copies of the environment's geometry was a problem.
In addition to characters, our game also creates many small artifacts to make special effects
such as explosions and debris. These small artifacts require fast collision detection as well.
Creating another BSP tree for every particle size is just not feasible. This problem was
overcome by using the dynamic plane shifting BSP traversal discussed in the next sections.
Instead of using expensive collision test or having to store extra offset surfaces, MDK2 used a
variation of the line-segment to BSP collision algorithm referred to as "dynamic plane shifting
BSP traversal". Collision detection for a non-zero volume object is still done using a fast line
segment check. The environment is represented with only one standard BSP tree that was
constructed without any regard for what shapes it would be doing collision detection with. We
modify the plane equations of a BSP tree during the collision detection traversal, which gives us
a reasonable approximation for collision detection of an arbitrary convex shaped object moving
along a linear path.
The standard algorithm for colliding a ray with a BSP tree is a recursive function that starts at
the root. If the segment lies on one side of the node's plane then the segment is passed down
to the corresponding subtree. Otherwise, the segment crosses the plane so it is split. The first
piece of the segment is checked. If it fails to collide then the second piece of the segment is
checked against the other subtree. If a solid leaf node is reached then the algorithm returns a
collision with impact equal to the start of the subsegment that reached the leaf.
Here, in more detail, is our revised algorithm that dynamically alters the plane equations:
A thorough description of Dynamic Plane Shifting BSP Traversal can be found in the proceedings
to Graphics Interface 2000, which is available online at www.graphicsinterface.org, or you can
try my web page at www.cs.ualberta.ca/~melax.
Performance Overhead
There is a performance cost for object collision queries that use the dynamic plane shifting
technique. Additional nodes will be visited due to the bi-directional shifting of the node's plane
equations. Testing an object as large as the environment will end up searching every node in
the tree. Clearly dynamic plane shifting works best for colliding objects that are small relative to
the size of the environment. For some "real world" performance results we look at execution
times from MDK2.
When we replaced our multiple tree collision detection with our dynamic plane shifting solution,
we did not detect a noticeable difference in the overall performance of the program. Most of the
computing resources go into other areas such as animation, AI, physics, and rendering -
collision detection is a small fraction of the work. Furthermore, many of the BSP queries are for
particle, bullet, line-of-sight, lens flare, and shadow checks. These require little or no geometry
expansion. To see the cost of the algorithm, we measured a player character's collision times in
isolation.
We compare 3 methods of collision detection: regular BSP collision (labeled Ray), spherical
offset, and cylindrical offset. Each method received the same input parameters, including the
player's current position and desired new position for that frame. Nothing but BSP traversal
code contributed to the times. Testing was done on a Pentium 3, 400MHz. Our results show that
plane shifting collision detection can be 2.5 to 3.5 times more expensive. Putting these results
in context, at 30 FPS, each frame allows 33000 microseconds. We felt an additional 20 to 50
microseconds for a character's collision detection was worth the flexibility of allowing different
sized characters and objects in our game. We also noticed that the performance did not vary
that much from one frame to the next. In a 3D interactive application, a good frame rate must
be maintained at all times - not just in the "average" case.
In MDK2 everything (and I mean everything) was already being built in 3DS Max. We even had
dummy nodes that we used to place lights, particle emitters, and bad guys in the level. We
used the mesh modifier facility to add the BSP compiler to the art production pipeline. When the
modifier was added to a node it would create a BSP tree out of its polygons.
The BSP compiler required non-self-intersecting borderless 2-manifold geometry as input. 3DS
Max doesn't automatically place modeling constraints on the content. Therefore, extra work was
required for the artists to get the models to comply with our requirements.
Another disadvantage of not having modeling constraints is that surfaces that should be flat
often aren't. In many situations there were 2 triangles that made up a quad but the 4 vertices
weren't coplanar within a large epsilon. This ended up making the BSP trees larger than should
have been necessary. In contrast, a Quake editor will insist that the sides of a brush are flat
even if it has to have an extra vertex and make a couple of skinny triangles. Considering how
fast 3D rendering hardware is now and how slow random memory accesses can be (such as
tree searches), it is probably better to have geometry with a minimal half-space representation
than a minimal mesh representation.
Neverwinter Nights
Neverwinter Nights (NWN) is a very exciting and ambitious project. However, it isn't an
action/physics oriented game where characters are jumping and shooting in a 3D environment.
The NWN RPG is based on the D&D rule system. Consequently the needs for collision detection
are different than for MDK2. Most of the collision tests will only be line segment tests required
for user selections, determining the path-searching node under the character, and line-of-sight
between 2 characters. Furthermore, NWN is a very large game production, so there is no time
for dealing with problems such as fixing broken meshes. The game engine has to be able to
deal with non-solid or "polygon soup" geometry. Therefore we chose to use a hierarchical
overlapping bounding box approach for NWN's collision detection such as AABB or OBB-Tree.
We are currently leaning toward just using AABB due to the smaller memory requirements, and
the AABB performance seems to be just as good [JGT reference]. AABB requires less work per
node since no rotation is required, but the boxes can be larger and overlap more. Given a large
percentage of the polygons are already axis-aligned (floors and walls on the NWN tiles) the
oriented bounding boxes will not be much tighter than the aligned bounding boxes. Please note
that we need to confirm these hunches with more experiments.
NWN uses a similar art production pipeline to MDK2. Similar to the BSP modifier, there is a
modifier that computes bounding box hierarchy information.
Future
There is nothing concrete planned for the future of the action oriented player control and the
BSP collision algorithm. However, there are some directions that would be interesting.
Games in general could start doing more with being able to manipulate physical objects in the
environment. I'd like the ability to rearrange some of the objects while playing a game so I can
make myself a better camping spot to get more kills with my AWP.
As I confessed, our BSP production wasn't perfect. I'd like to improve this while still allowing
the artists to model as they do now. They appreciated being able to use 3DS Max. I would want
to continue to let them make non-convex geometry, but have some better tools to find and
possibly fix the problems that can occur. The single big mesh approach from MDK2 was more
work and was very unforgiving when errors did occur. It would be better to allow for multiple
objects or meshes and let them interpenetrate with each other.
I've already started testing an extension to the BSP collision algorithm with general motion. It
works, but more testing is required to measure performance gains over more accurate
techniques. Although there can be some inaccuracy, the algorithm may be appropriate when
there are many objects, they are fast moving, or when they are of lower priority.
For any references and follow up to this presentation, please see the web page:
http://www.cs.ualberta.ca/~melax
http://www.gamasutra.com/features/20010324/melax_01.htm
Interesting Times
Well, a fair bit's changed throughout the year 2000. We've seen the rise and plateau of Napster.
We've seen the advent of viral marketing as a viable technique. People are beginning to
recognize that being open isn't such an unsound business practice as it might at first have
appeared - Open Source is now a creditable development strategy. Microsoft is getting worried,
I mean interested in Linux. Intel is getting worried, I mean interested in peer-to-peer
computing and distributed processing. Meanwhile Cisco is rubbing its hands together in glee -
though I understand that we may yet see a revolution in the use of domestic wireless routing
devices. Perhaps Cisco is interested in that area? Power seems to be returning to the people…
Why is all this happening? It's the Internet. The Internet with the Web as its visible face makes
the world a small place. Traditional business models may have been the first to be applied
online, but more suitable models are beginning to arise. With a small world, with the miniscule
amounts of friction present, competitive strategies that rely on having enough time for an
inertia burdened business to adjust to a change, simply can't cope against lightweight
individuals and co-operatives. Cut out the middle man. Deal direct. Co-operate. Do your
business in full view of your traditional competitors, because it doesn't matter what they see,
they haven't a hope of catching up in any case.
This applies to the games industry too. The music industry is in crisis, the movie industry may
be next. The software industry as a whole is undergoing its own revolution. Digital content is
simply too mobile to remain protected. Information really does want to be free. Games are no
different. The players want them for nothing. That's not to say that players won't pay if they
have to, in fact most players are in perfect accord with the idea of the games developer getting
a fair reward for their labor, it's simply that games developers, just as musicians, should keep
an eye on the writing on the wall. Things are going to change. If you want to be prepared: learn
Linux, buy an Indrema, set up a website, join an Open Source team.
In the not too distant future, the player will be paying your wages direct.
Cyberspace is simply the logical evolution of peer to peer systems such as Napster, Gnutella,
FreeNet, MojoNation, FreeHaven, etc. While the latter and others will be concerned with
distribution of static content (music, images, movies, software, etc.), sharing of storage and
sharing of processing resources, cyberspace will be the system that combines elements of all
these together to distribute dynamic and interactive content. It's the low friction way of
delivering digital content: anyone can put something in one end and it's instantly available to
anyone else. Instead of treating digital content like a tangible commodity that can't be readily
duplicated, and one that requires a one-to-one exchange mechanism, we instead replace it with
a system that treats all content as inherently manifest. It's a bit like the Borg in Star Trek - any
thought that occurs in one mind is available to all.
So in terms of games, cyberspace is a platform that supports many games, but for each game
there is no concept that the number of players is limited. Each game is a persistent
environment that expands in proportion to the number of players playing within it. If it's fun for
one, it's fun for one thousand. If it's fun for a thousand, it's fun for a million. It's a shared
environment, a virtual space, a sandpit in which any number of people can have fun. The
games designer figures out the rules, creates the style, sets the theme, provides some
suggestions for objectives, but largely the players make their own fun.
This is where the peer-to-peer paradigm makes its entrance; the distributed systems
technology, that comes into its own by ensuring that whatever one player does is visible to all
the other players in the vicinity. Not only are we distributing content, we are also distributing
change in that content. This is what all multiplayer games are doing. It's just that the current
approaches in the form of client/server and crude peer-to-peer systems aren't sufficiently
scalable for billion player games. Truly scalable systems design is what I'm trying to get across
to you in this series of articles. Whether your game attracts a million players or only a hundred,
if your design is scalable at least you can cope with a success. Imagine if you had a good idea
for a multiplayer game and it was so popular it ground to a halt at a hundred thousand players.
What a bummer eh? Instead of driving away in a Ferrari you end up with a major administrative
headache. No problem you say, we just create parallel instances of the game in order to
balance the load.
I'm wondering if this is really a matter of convenience rather than evidence of sound design.
What would have happened if the web ground to a halt at 10 million users? Oh, no problem
we'll just create multiple webs. We'll have MSN, AOL, Compuserve, Reuters, FreeServe, Yahoo,
IBM, etc. A hundred different webs each slightly different, but with a heck of a lot of overlap. It
wouldn't just be "Oh shall we get the .net as well as the .com domain name?", it would be
"Should we have a presence on MSN as well as AOL?"
Here we have a potential success story, a game that's so good 40 million players want to join
in. That's 40 million Pentiums beavering away. If we can produce a system that copes with
10,000 players why not 10 million? Let's not be so lazy that we allow a design limit that creates
a ceiling on our success? It is better to get a dollar from 10 million punters for a little extra
design effort, than it is to charge ten dollars to 100,000 players with all the admin costs of
spreading players across different shards. Why do we ever produce a piece of software with the
knowledge that human intervention will be required if it's too successful?
I don't know about you, but I'm into the idea of 'fire and forget' software. I want to produce a
single package of software capable of supporting a billion players that will never encounter
problems with support costs, additional infrastructure, software updates, maintenance,
telephone help lines, etc. One game - a billion players - zero admin. What could be simpler?
I know there are people out there who have incredible difficulty understanding why on earth a
game would benefit from even a million players, when surely a thousand is plenty? Check out
the Star Wars Galaxies discussion boards for a discussion where prospective players of the Star
Wars online game are even now questioning the wisdom of having multiple Star Wars galaxies,
i.e. several instances of the same game each with tens of thousands of players. Instead of
admitting it as a technical limitation, the excuse is that there's not enough content to support a
greater number of players in a single game. Blimey, a whole galaxy, and they can't squeeze a
few million players into it?
Space is mind-bogglingly big as Douglas Adams once wrote, and that's Big with a big 'b'. What
I'm going to spend the rest of this article talking about, is how to engineer scalability into a
distributed modeling system. And that means Scalable with a big 'S' - making sure that no part
of the design grinds to a halt as the numbers start getting big with a big 'B'.
Threats to Scalability
How do we know a threat to scalability when we see it? It's any part of a design that exhibits an
assumption concerning a 'reasonable use' and that is embodied as some kind of absolute
limitation. Any time you see people storing years in a single byte, filenames in 16 bytes,
directories in 256 bytes, any kind of fixed limit like that is an obvious candidate.
However, scalability limits can manifest themselves in more subtle ways. For example if you are
imagining a system that is likely to be used for applications with numbers of simultaneous users
on the order of hundreds, then an operation that takes an amount of time that has a squared
relationship with the number of users is a big problem. It might be 100ms in the case of a
hundred users, but 400ms for two hundred - not too bad. However, if you go up to ten
thousand users it takes a quarter of an hour, and for a hundred thousand users you have to
wait longer than a day.
Even a linear relationship can be a problem. Say a system, in some reconciliation process,
consumes a certain amount of bandwidth from every connection of every connected player. It
might be just one bit per second, but the system will grind to a halt at around 56,000 players
(probably much sooner). This is the main reason why client/server is inherently non-scalable. If
you require high bandwidth from each player to a single server then at certain number of
players (about 5,000 say) you begin to start warping the network infrastructure - even if you do
have a server that's powerful enough to cope with the workload. Sure, ask for half a dozen T3
connections to be piped into your server room, you might end up waiting a few months while
the network warps into shape to accommodate you - unless of course, you just happen to site
your server room near a decent backbone…
The only relationship we can really even countenance is a logarithmic one, i.e. linear growth in
overhead in proportion to exponential growth in players. For example, if you need one more bit
in a particular data word for every doubling in user players, a 32 bit word allows you to cope
with 4 billion players. But even then, nothing's that straightforward in the real world - you have
to watch out for noise, spurious anomalies, and nefarious phenomena. Sod's law puts paid to
back-of-the-envelope calculations that should work in theory. And where Sod's law finds the job
too hard, there's plenty of hackers to fill the breach.
So if you honestly think that "We'll never get more than about a hundred players congregating
in the same place" - hah!
Of course, you can make assumptions about what should happen in practice, but you still need
to cater for what shouldn't, because it will happen. The trick is in ensuring that the system
degrades gracefully and appropriate to the situation. If there simply isn't enough bandwidth to
support a riot, then only those participating in the riot should notice any reduction in fidelity.
Ideally players would still see the behavior of their closest opponents, but those further away in
the crowd would simply be modeled from infrequent samples. This brings me back to the idea
of prioritization as one of the solutions brought to us by the 'best effort' philosophy. When
perfection is impossible, the best is good enough, and certainly better than failure.
Things Change
Hand in hand with scalability goes adaptability. Players are pretty unpredictable blighters and
they can change in behavior from day to day or even second to second. Players are human
beings (well most of them) and as we all know, human beings are pretty clever when it comes
to understanding their environment - their lives have depended on it - probably why intelligence
evolved in the first place. Any system has to be almost as intelligent in adapting to its users as
its users are in adapting to it. One thing that the architects didn't figure when the designed the
millennium bridge over the Thames in London was that its users aren't all independent entities.
Even subtle behavioral cues can be enough to get people to act in concert (causing a bridge to
oscillate wildly in this case). With multiplayer games it's much worse: we tend to have to
presume we're dealing with players deliberately setting out to stress the game to breaking
point.
But, there's more to change than just the players. We also have computers winking in and out
of the network, coming back with bigger hard disks, CPUs, and ADSL connections. Considerate
ISPs might realize they can attract more subscribers if they donate some spare capacity by
making nodes out of some of their Linux boxes.
Even the network itself is in a continuous state of flux, in terms of effective topology as well as
traffic, with consequent unpredictable fluctuations in bandwidth and latency. Sometimes a
network might undergo a serious problem when an earthquake strikes a key set of points. A
satellite might be zapped by aliens.
In general, anything could happen to pull the rug out from under the system. However, it must
adapt. We can't allow a game to go on for two years that builds up to a 100 million players
many of whom may have made a considerable investment of effort in building towns, cities,
empires, relationships with other players, spent a wodge of money on certain weapons or
resources, only for it to fail when someone accidentally pulls a plug. "No worries everyone -
we'll restart it on Monday!" The outcry might trigger a civil war!
Hopefully you'll notice where scalability and adaptability come into play in designing for a billion
players.
A Self-Organizing System
Each player interacts with the system via a user interface running on a piece of software I call
the front-end. This front-end interacts with a back-end service operating as a node in the
distributed system. It is the back-end that performs the modeling of the virtual world and does
its best to communicate the modeling it does with any other nodes that may be interested and
to receive their communication of the modeling that they are doing. The process of managing
the relationships and responsibilities also falls to the back-end.
Each node can be considered to correspond to a player's computer. However, this is not
necessarily the case. It is possible that multiple front-ends may exist on the same computer - a
split screen on a console for example. Alternatively, multiple front-ends may be running on
different computers (mobile, handheld devices) and they could all be talking to the same back-
end. Multiple back-ends may also exist on the same computer, e.g. one node acting in a fully
interactive capacity and operating from a fast hard disk, one node acting in a back-up capacity
operating from relatively slow near-line storage, and one node might even be operating in a
read-only capacity from a DVD-ROM jukebox. But, there might be plenty of CPU capacity for
them to all operate on the same computer.
Anyway, for the time being we'll consider that we're operating on a basis of 'a computer is a
node'.
Let's assume we know all about how to uniquely identify computers/nodes, objects, players,
and anything else. We don't have a problem utilizing locally available storage (hard disk) to
store as many objects as may interest us now or in the near future. We don't have a problem
utilizing locally available processing (CPU) to execute as many object methods and modeling
services as we can. We don't have a problem sharing the CPU between the player's visualization
application (3D renderer) and the modeling back-end. We don't have a problem exploiting all
available communication channels to get in touch with other nodes, nor do we have much
difficulty with the idea of how a new node goes about discovering a fairly good node to initiate
contact with.
There is of course the issue of security, and although at first glance there may seem to be an
insurmountable security flaw in any system that utilizes client-side resources, let's remember
that we're dealing with two related, and very difficult problems here: a scalable distributed
modeling system, and a secure one. Let's not give up on one, just because we can't see how to
solve the other. Putting it metaphorically: if we're trying to build an airplane, let's not give up
just because no one's invented parachutes yet. And you never know, once flight becomes
popular, the unthinkable idea of flying without a parachute might just end up being quietly
forgotten.
If we first understand how a system can be scalable, then we can qualify ourselves to be in the
position of understanding how it can be secure.
If you want to have a comparable idea of the sort of thing I'm describing then JavaSpaces
would be a good term to plug into your search engine. It's an evolution from Linda - based on
David Gelernter's tuple-space idea.
What the games programmer sees in terms of objects described in a programming language
and what goes on under the hood of the distributed system that supports it can be somewhat
different. Imagine a language much like Java for the time being. However, common to many
virtual machines, one isn't necessarily tied to a particular language. Even if the virtual machine
is object oriented, one can develop languages appropriate to particular types of application -
games for instance. I'm envisaging such a language tailored to games, but to keep things
simple for the time being, we only need to appreciate that the storage of our objects can be
managed irrespective of the programming language and virtual machine that manipulates
them.
In our object store we keep a record of the class inheritance hierarchy and the details
concerning the definitions of each classes' methods (or properties). The class is an object as
well as a template that governs the form of instances of objects of the class. Each class defines
methods which either execute code or manipulate corresponding state variables. All objects
(including class objects) contain details of their ownership, e.g. last known owner (lease-holding
end-user), last known freeholder.
Figure 1: Single inheritance of properties and
operations
All we need to distribute then are objects and the classes that define them. The objects consist
of one or more values (Figure 2). These values are held within method slots, and a value
represents either an operation or a property. Each operation consists of a string of byte codes
and each property consists of a value. However, as operations are the same for each object
these only appear in the class object - that special object that defines the class operations and
default values for each property. In this way objects will only contain values that differ from the
defaults.
As you can see (Figure 1) a class object may inherit from another class object. In this case, the
derived class object only contains operations or properties that differ from those in the base
class. All methods are implicitly virtual. Note though that only single inheritance is supported in
this scheme (it'll do for starters).
Figure 3: Example of Object Layout
Object Layout
When we come to implementing our object database, we're probably going to end up with
something like Figure 3. I won't say it's going to look exactly like that, but there will be some
similarity.
Each object will need to contain information sufficient to track down its class definition
(inheritance, class methods, property defaults, etc.), i.e. both the details of the class and a
good idea of which node to talk to in order to obtain those details. We also need some
information to give us an idea of how up to date we are in order to specify what updates we're
interested in. We can use a system of revision information which may be as simple as a revision
serial number, or it may involve a timestamp of some sort, or even both. Note that time on the
Internet is a problem all by itself.
Note that the local storage requirement of an object will be larger than the amount of data
required to transmit some of its details. There are many ways of optimizing the communication
overhead. For example, if the receiver communicates the extent to which it is up to date, then
the sender only needs to send more recent information.
In case you're concerned about the local storage requirement, recall that one of our guiding
objectives is to prioritize the reduction of communication overheads over and above any
reduction of storage or processing overheads.
Values
Values are either immediate values or references to larger data elements that are held in an
appropriate repository adjoining the object store. Note though that all values are immutable
(constant), the object (including the class object) is the only mutable entity that the system
deals with. Series of values may be created, but once created they remain constant (until they
get destroyed upon a zero reference count).
This allows us to easily refer to large amounts of constant data that many players already have.
For example, it is likely that someone could produce a DVD-ROM of a snap-shot common
textures and geometry that exist in the system. All objects that use these only need to transmit
the references to such constant data. Of course, if the node doesn't have the data available
then they must download it, but this can be done at a relatively low priority (a lesser level of
detail object is likely to be sufficient in the interim).
Note that large values may only be deliberately destroyed if they have never been
communicated outside the node. A similar policy exists with respect to IDs for objects, series,
etc. Local IDs can be used for greater convenience until something is communicated outside the
node, in which case globally unique IDs must be used. NB they can still be tokenized in cases
where they are mentioned several times in a message. Of course, intermediate values that arise
within a computation do not need to be stored. It is only when persistence is required that
values need to be written to persistent storage.
Operations
Aspects of Distribution
To permit the game code to be aware to some extent of the distributed nature of the underlying
system, there may be a need to mark some operations as operable only if the object is owned,
or not owned. In addition, some operations could be marked as auto-forwarded, i.e. the call is
forwarded to the owner of the object and executed on that object, with the result returned.
These could be blocking (wait for return), or non-blocking (result ignored). Such things may
require different underlying communications strategies, but as long as the game developer
understands what they're doing, such low-level controls may come in handy sometimes, e.g. in
achieving synchronization where it's critical.
Persistence
Remember that the persistent storage system is a limited resource. A policy similar to 'least
recently used' will remove objects or values when space runs out. In this case it will be an 'of
least current interest' policy. When a property value is missing the class default for that
property is used, and null is used for missing values resulting from a computation (rare). When
an object is missing, a default instance of the same class is used. When a class method is
missing the base class' method is used. Ultimately a null value is used. Generally, the best
default is used in the event of missing information. Whilst one could create diagnostic tools to
catch such events, there really isn't any point in alerting the user or trying to do any recovery,
because these are likely events and there's no remedy available in any case. You can't restart
the system or perform a roll-back. You simply have to assume that such missing data only
occurs in relation to particularly uninteresting objects. For example, the lush mahogany texture
map may be missing, but then the default wood texture may be used instead. Naturally, it is up
to the games programmer to utilize the inheritance facility to create a cascade of ever more
sophisticated detail, i.e. define how a simple object property of wooden, is part of a hierarchy
that at some point may be flat-shaded brown, but at another point is highly polished
mahogany. Given objects could have their base properties implicitly prioritized for distribution
over and above their more derived properties, this can help reduce distractions caused by
degradation in simulation fidelity of objects at the periphery of ones area of interest (it's better
for a distant animal to appear of the correct color, than an arbitrary one, if it's fur details were
too big to download fast enough). Similarly, it's better for a distant vehicle to have its general
vehicular properties downloaded before its specific properties (behavior, damage record, cargo,
current operating parameters, etc.).
Ownership
All nodes that own objects get to own them because heuristics determined them as the most
suitable nodes to own them, and these heuristics obviously have to be contrived to encourage
ownership by nodes that are interested in the objects and have enough resources to do a good
job in modeling them. In other words, all the glitched modeling that occurs is very likely to be
related to uninteresting objects, and thus unowned objects. Therefore, we can expect that such
erroneously modeled objects will be overridden by incoming updates of the objects' state from
the owner.
Communication
Remember, whilst most systems have a hardwired distinction between properties necessary for
the visualization of an object (position, orientation), and those necessary for the behavioral
modeling of an object, communicating the former (with dead-reckoning), and not
communicating the latter. In our case this distinction is not hardwired, but determined by the
game's designer, moreover, it is done on a priority basis. This means that if the behavioral
properties are distributed on a lower priority than the salient properties (position, orientation),
then at least these salient details will be communicated. When the behavioral properties get to
us (as the object becomes more interesting) we obtain effective dead-reckoning. Perhaps not
that kind of dead-reckoning where the server is aware of the client's prediction algorithm (and
duplicates it) and then only needs to advise the client when its estimation diverges too much.
Nevertheless, as both parties in our case are expected to perform the same modeling there is
still some potential for prioritizing communication according to the computed value's distance
from the arbitrated value, though the game designer would have to determine the precise
relationship if any. Perhaps this would best be left as an empirical research exercise.
Perhaps I should note here that when you have a huge virtual world, it gets so big that the
client software cannot be forewarned of all the content that is likely to come its way. A player
can't be required to upgrade their software just because someone elsewhere has invented a
new vehicle type. We have to design the system such that the client can obtain information
about how to model a new object of which it was previously unaware. Not only will there be
more objects than a single computer can store, but there will be more classes of objects than a
single computer can store the modeling details of.
Like the Web, the system has to cope with live and continuous development of the underlying
software, the game design, the game content, and the game state. It cannot be shut down,
reset, or suspended.
A Dynamic Hierarchy
Keeping Track
The distributed system is organized in a hierarchy for two key reasons: keeping track of the
participating nodes, and keeping track of responsibility for all the objects distributed between
the participants. The system needs to enable one node to find any other node, and for a node
to understand its place in the grand scheme of things. The system also needs to store an
unlimited number of objects and keep track of these, even as they get distributed around the
system.
Participation
Why does a node participate? Because it has an attendant player that has an interest in playing,
or more precisely, observing and influencing the virtual world currently being modeled
(consisting as a set of objects).
A node needs firstly to make contact with any other participating node. It may be that some
nodes are well known (probably roots or relatively senior nodes), or there may be some channel
upon which participating nodes are advertised. There may even be multiple virtual worlds
available from which the player can select.
Once a node has made contact, it is ready to find a close node with which to use as an initial
parent (having good a connection, but with broad coverage of the virtual world appropriate to
the topological network position of the player). Possibly unaware of the rapidly improving
connection, the player selects the area of the virtual world, or the virtual character, that
interests them most in terms of wishing to observe and influence (Princess Leia, Robin Hood, or
Julius Caesar - if he's free). The game may construct mechanisms for allocating particular areas
or characters of the virtual world to particular players or teams of players. Either way, when the
player next connects, the nodes they made contact with last time are likely to be a good initial
guess.
Once the area or object in the virtual world that the player is interested in has been
determined, then this interest will be expressed to the system and adjust the selection of the
player's node's parent in due course.
Naturally, the player and the object (avatar) that they're influencing will affect and experience
continuous change in the set of objects of interest to the node. This, together with the changing
conditions of the network, the changing relationships of other nodes (players joining, leaving),
and any other changes, will cause occasional changes in parent.
A Hierarchy of Responsibility?
We start from a single node, but instead of making the default relationship a peer one, we
make it a hierarchical one. This is because we are not trying to partition or divide the system,
we're just trying to spread the workload. Responsibility ultimately remains with a single
computer, or in some sense we always have a single server, it's just distributed over several
computers.
Now one of the things about a hierarchy that may cause concern is if there is still some kind of
client/server communications bottleneck effect. If there is any kind of aggregation performed by
nodes, then it would appear that the root node would get a tad overloaded. A hierarchy would
seem to act as much as a concentrator as the single server of a client/server arrangement, i.e.
we would look forward to the same bottlenecks.
Well, I think a hierarchy has the effect of releasing that focus to dissipate throughout the
system. It also allows nodes to organize themselves according the role that best suits them.
Server-like computers end up serving and not much clienting, and client-like computers do a lot
of clienting and little serving. Computers in between do a bit of both.
Think of it like a living tree, the closer you get to the root or the heartwood, the stronger and
more stable things are. Conversely, at the periphery, there is more life and less stability. The
players' computers are at the periphery, and the more reliable and capacious computers reside
at the center. The most recent communications are gossiped around via peer connections,
whereas the state updates to parents gradually migrate to the persistent core.
There are a variety of configurations which can support a networked game. If we have a truly
adaptable distributed system then we'd expect it to assume a similar communications
configuration if the connectivity and participants were the same. It all depends on the way the
variety of heuristics are tuned, but if they're tuned well, we'd expect them to make the system
adopt a fairly optimal configuration.
16 Player Peer-to-peer
Say we have 16 players, but they all have the same capacity computer. Well, the first player to
start will become the root node. As it has all the responsibility it will initially appear the best
node for all other nodes to child onto. At some point the root may realize that it has insufficient
capacity to maintain the persistence of its state and may delegate this to children (according to
their interest). Node relationships will also organize according to bandwidth costs. It may be
that the eighth node finds it more appropriate to child to one of the non-root nodes than the
root, simply because of the better bandwidth.Ultimately, what is likely to happen will be for
persistence responsibility to be distributed around the nodes as necessary (if the root can't
cope). Ownership is likely to follow this distribution.
As is more likely, if all computers can easily maintain all state, then the root node is likely to
retain storage responsibility for all state, but each node will express interest in all objects to all
other nodes. We'll end up with a fairly balanced hierarchy, with additional peer connections
between nodes, until each node communicates its owned object state changes to each other
node. The child/parent connection acting as one of these one to many connections.
So while all nodes have equally balanced interests and capabilities, the connections we end up
with look very similar to the connections in a peer-to-peer configuration. However, as soon as
computers deviate from this, then the nodes will migrate around the hierarchy according to
their relative interests and capabilities.
For short-session based games, it would be overkill to use such a scalable system, but we'd still
expect it to adopt an appropriate configuration.
Client/Server
If we had a supercomputer with great reliability, availability, connectivity, capacity, etc. and
umpteen player computers, much poorer in every respect, then the hierarchy is likely to adopt
a client/server configuration. The supercomputer would be the root, and each player's computer
a child node to it. It's unlikely that any player's computer would get to own any objects at all
(perhaps their avatar, but that's about it). It may still happen that some nodes will create
connections with each other if only to obtain details of player owned avatars, but these would
be pretty light-weight.
Overall, we'd end up seeing the parent/child connections becoming the most important.
However, if latency with the server becomes significant, more and more peer connections are
likely. You might even end up with mini hierarchies developing between mutually interested
groups of players, with ownership migration becoming likely too.
Distributed Server
With several supercomputers dispersed around the network, it's likely that you'd end up with a
relatively central root (if that's possible), with object responsibility being portioned out to the
dispersed super-nodes. It's expected that the game conspires to encourage players to be
interested in the content held by their nearest super-node, and so players are likely to child off
this one. They'll peer of any other node on a need-to-know basis.
Of course, if you really like this kind of flexibility, but prefer the intrinsic security of siting super-
servers at ISP's premises, then you could decide to prevent object ownership migrating beyond
these nodes. In this way, you end up with a system fairly comparable to a 'distributed server'
system.
Scalable Peer-to-peer
If you have an unknown mix of an unlimited number of computers, then this adaptive system is
ideal. If any philanthropist donates a super-computer to the cause it's likely to quickly migrate
towards the root of the hierarchy. ISPs that donate some of their as yet unused rack-mounted
Linux boxes are likely to reduce the bandwidth used by their subscribers, saving them money at
the same time as adding value to their service. Even some players that leave their computers
on whilst they're not using them (ADSL) can add some useful resources to the pool.
So in this case we maximize the use of all computers, we don't make any particular
assumptions about how resources are distributed, but the more resources that are available the
better the system's overall performance becomes.
Responsibility
Responsibility means being answerable about objects. One node is always responsible for all
objects. This doesn't necessarily require that it be a supercomputer, but it would be nice.
Responsibility entails the following duties:
Now, in terms of analogues, registration is equivalent to the land registry of a property (the
state ultimately controls everything), persistence is equivalent to the freehold of a property (the
freeholder is nominally in control of the property), arbitration is equivalent to the use of a
property (practical ownership), and estimation is perhaps (at a stretch) equivalent to visiting
the property. There are no particular terms used to describe the delegation of registration or
storage, but the act of delegating arbitration (or ownership) is equivalent to leasing and sub-
leasing, etc.
New objects can only be created by the root or owned objects. In the case of an owning node, it
then becomes responsible for the new object, however, it is obliged to pass this responsibility
up toward the root. It's parent is similarly obliged.
At the Limit
The root node is responsible for all objects, but it is possible that it can run out of capacity to
perform other duties. When the root no longer has capacity to register all objects it delegates
that duty to the child that attempted to pass responsibility up to it. Thus when it runs out of
storage capacity, it knows that some objects are only registered by a child. This doesn't just
apply to the root, one of its children could also run out of space. In general, if such a node gets
a request for an object it doesn't know about, it also checks with its child in addition to referring
to its parent.
A node is more likely to run out of space for maintaining persistent storage of all the objects
created underneath it. In this case (assuming no parent has the capacity), it delegates the duty
for persistence to the child that passed the object up (the child gets the freehold). This means
that if a node only has space for registration, it can at least know that it can service a request
for the state of that node by passing it to the child to which it delegated persistence.
If an object is created for which no node has the capacity to maintain its persistence (even the
creating node), then the object never gets any state (only defaults). If an object is created for
which no node even has the capacity to register it, then it can't have been created, or in other
words, the creation of an object requires its registration at least on the creating node.
Given this limiting behavior you can see that even in the case where objects have expanded in
number to fill the space that's available, they end up overflowing from the root back out toward
the branches. Ultimately in this way, such overflow objects become more and more remote. It
then becomes less reliable to get hold of them and more prone to delay given limited bandwidth
and the number of hops involved. Of course, once made, peer connections will obtain their
details fine, but the objects won't be as persistent. When nodes that have responsibility for
persistence of certain objects go offline, then it may be that some of the node's children stuck
around and can assume responsibility instead. Otherwise, those objects' state becomes
inaccessible (unless a peer node coincidentally had cached copies).
The system could then eventually run out of capacity, and no new objects could be created.
That's a heck of a lot of objects if you imagine the collective capacity of a few million computers
in a few years' time. The game design could do a lot to avoid this getting consumed too rapidly,
but it may be that we'd need a policy that purged registration of the least frequently accessed
objects. Bear in mind we're looking at something comparable to the situation where web sites
would be growing in size faster than ISPs could plug in extra hard disk drives. It's a race
between exponential growth in demand and capacity. I have a hunch that demand only
increases in response to capacity, so perhaps we'll always just have enough. Hmmmn, is
addressing space for only 2^64 objects enough for a few years - what do you think?
Interest
Let's look a bit more at how objects get distributed around the hierarchy of nodes - especially
when the system is operating within its capacity.
All objects have a responsibility to express interest in the objects that may affect their behavior.
Naturally, there are plenty of passive objects that can rely on being of interest to others (rocks,
say).
An 'Interest' is a set of criteria and acts as a measure of relevance to which any number of
other objects may be compared.
Interests are essentially constant entities, but they have a key feature: it is straightforward to
tell if any object meets the Interest or not (partial match is classed as non-matching). An
Interest could take the form of a template object with the criteria being 'all objects of this class
that have the same property values'. This would be an equivalence operation, but in some
cases it may be useful to perform a distance operation between a 3D position property, e.g. 'all
objects of this class that have a 3D position less than L away from the 3D position specified'.
However, more than the ability to determine whether an object meets an Interest, it may be
useful to see how interesting or uninteresting it is. This may be useful, but perhaps for the time
being we can make do with an Interest obtaining a logical response as opposed to a fuzzy one.
Remember we're already allowing objects to express an Interest as a prioritized demand for
details all objects that meet particular criteria (whichever node they may reside on), it may be
going a bit to far to allow the goodness at meeting the Interest to further moderate the priority
of individual objects.
Given Interests deal with matching objects it's likely that an Interest may end up being laid out
in a manner very similar to the objects. An Interest is therefore also likely to observe the same
class hierarchy as objects. Interests may also have a variable lifetime, dependent upon the
behavior of the object that's expressing them. Static objects may have long term interests in a
relatively fixed sphere, whereas dynamic objects may have repeated short term interests in a
conical region in front of them. Though I've used spatial examples here, there's no reason to
hard-code interests in terms of spatial properties. A mercenary NPC may be interested in all
country stats objects containing at least 1 battle in progress.
Note that Interests only get communicated if their lifetime is sufficiently long enough to warrant
it. Their lifetimes however, must be designed carefully to ensure useful results. There's no point
getting tons of objects downloaded if it's highly likely that they'll not be interesting once they've
arrived. It's also worth pointing out here, that Interests are not intended to be used for
modeling purposes, e.g. collision detection. There are separate services provided by each node
that can monitor objects (that have caused themselves to be registered to the proximity
analysis service) and then raise events as appropriate upon collision.
If you're worried about performance issues in satisfying all these interests, note that although
Interests don't need to be based on spatial distance this doesn't stop us providing a low level
service that does allow objects to be spatially indexed (octree, whatever). We can use this in
order that spatial interests can be satisfied promptly.
An object is interested in all other objects that are relevant to its behavior. Ultimately (as
Douglas Adams might phrase it) a single piece of fairy cake can be influenced by every other
atom in the universe. However, for our purposes we can get by with a decent proportion of the
local environment. Though players with big enough hard disks, might well end up caching
nearly the entire virtual world, even if the player was just interested in a piece of fairy cake (it
would have to be non-passive in this case).
If a farmer is interested in foxes and rabbits, but there is a rabbit that is too distant from the
farmer's measure of interest to get downloaded, it may happen that a fox will be interested in
the rabbit, and raise the importance of the rabbit sufficient for it to be brought in to the node
and thus available to the farmer's perception.
The only way objects usually become able to be aware of one another in the modeling sense is
if they reside on the same node. An Interest implicitly represents the 'best effort' set of all
objects that meet the Interest's criteria. This set of objects will naturally change as and when
objects are downloaded or get pushed from the node's cache.
Each node's Interest Manager accumulates all the resident objects' interests, and does its best
to obtain objects that meet these interests. It will do this by communicating with the parent
(possibly children too) and the parent can do its best to satisfy one or more of the interests, but
it in turn may pass an Interest on as appropriate. The Interest in effect represents a seeking
tendril that feels around neighboring nodes until it find a good source to satisfy that interest, in
which case the tendril plugs in and forms a semi-permanent peer subscription to that node.
These peer relationships are chosen according to the advantages outweighing the cost (of
bandwidth).
For example some distant rocks might be passive objects, but another player's weapon might
blast them to bits. The weapon may be out of range of the node's interest, and thus the weapon
will not be modeled. However, in due course, there will be incoming state updates to the distant
rocks that results in a 'destroyed rocks' situation.
If they were distant bunny rabbits that tended to run away from farmers with shotguns, then
there's a good chance the bunnies would express interest in farmers (unlike rocks). A farmer
appearing on the horizon might just get downloaded (even if it was the barest of state info).
This might be enough to get the bunnies to run away in a fluid motion, rather than occasional
updates make them move in jerky fashion. There's still a good chance the bunnies run and get
discontinuously relocated to the remotely arbitrated position, as opposed to the local estimate,
but it's better than nothing.
An object with behavior that is affected by some aspect of its environment, expresses interest
accordingly. If that results in other objects turning up, that have behavior and a consequent
interest in their environment, then we end up with a set of objects prioritized according to their
relevance for modeling the experience of the observing avatar.
Two different avatars thus have a perception of 'reality' from two different perspectives.
The greater the degree of commonality in the interests of two avatars, the closer the modeling
of their experience will be. This is because they are likely to have the same set of objects
(though one node may have less resources than the other).
Thus the closer two avatars are, the more their perceptions will agree. And at the end of the
day, that's all we need to worry about. As long as interacting players have a consensual version
of reality then they'll make sufficiently similar interpretations of their situation that each player
will be able to believe all players are in the same 'reality'.
Note though, that it's up to the games designer to determine where the priorities lie in
contriving the best-effort modeling for an avatar. It may be more important to model enemy
helicopters than weather vanes on church steeples. Only the games designer will understand
the player enough to be able to have some idea of what is most likely to be important to their
experience.
Caveat Emptor
The system I'm gradually providing more and more clues about is one that attempts to address
issues of scalability. It's not one that makes minimal use of CPU or storage. It certainly doesn't
attempt to provide any integrity or consistency guarantees.
People are already developing systems which address particular configurations, particular player
counts, particular bandwidth and latency constraints, and because of this, such systems can
achieve some degree of integrity and consistency, and can be performance optimized in many
areas.
Do not mistake these articles as guidelines for developing the systems underlying contemporary
multiplayer games. There are many techniques and algorithms that I haven't covered, many of
which are critical to systems in use today. You'd need to get very familiar with them before
embarking on such development.
So, I'm not talking about how to design a system that is going to support umpteen thousands
of subscribing players and meet their demands for a glitch-free, 100% uptime, quality
experience, that is profanity free, and above all fair.
For that matter, if you built a system that observed the principles I'm suggesting you'd
probably end up with something that you couldn't charge for. Not only would there be no
security, but by the time a million or more people participated, there'd be loads of people
exchanging offensive material, and a few vandalizing the system wholesale.
It is difficult to imagine anyone who'd be 'brave' enough to invest in the development of such a
system.
However, if we're going to have cyberspace, a system as large as the web that exploits the
connected computers, then it doesn't matter how unsound a business prospect it is. As
technologists, as games developers looking to tomorrow, the sooner we understand the issues,
the sooner we'll create the entertainment system of the future.
It's one thing getting a plane to fly, it's another to stop it falling to pieces (or getting shot
down).
Next time I'll be discussing methods and strategies for ensuring the system can cope with
sudden and unexpected failure of nodes or parts of the network. These I'm pretty confident
about. It's less certain how to obtain security in the face of concerted attacks. However, I'll
have a go.
Until then, check out Dig It Network Multimedia Inc. for an example of a possible solution to
security in P2P games.
http://www.gamasutra.com/features/20010226/fitch_01.htm
With the ever-increasing resolutions made possible by modern 3D graphics cards, computer
games and real-time graphics are acquiring a uniform "look," imposed largely by technical
limitations and occasionally unimaginative programming. I'm talking about that sharp-edged,
polygonal look. Great advances in texture resolution and per-pixel lighting have helped to fill in
the previously fuzzy polygon interiors, but the edges and feel of the image often remain
frustratingly synthetic and "clean."
Imaginative lighting, such as moving shadows, bump maps, and specular highlights, can help
add variety to the rendered scene. These topics have been dealt with extensively elsewhere.
However, one option that is often overlooked is blurring elements of the scene; for example,
depth-of-field and focusing effects bring certain elements of an image into the foreground, hide
unnecessary clutter, and reduce the need to resort to over-used fogging tricks. Blurring is a
costly process, but a fundamentally simple and useful one.
This article presents a few tricks which can help make real-time blurring possible, and hopefully
will provide enough material to inspire you to invent your own hybrid techniques.
There are many ways to blur an image, but at a basic level it always comes down to low-pass
filtering of the image -- this can be achieved in many ways, often by convolution of the image
with a filter. It's instructive to think about the basics of this blurring process so you can
appreciate how the tricks work, and what their shortcomings are.
Figure 1 shows a source image, and the results of blurring it with two simple filters: the first is
a box filter, and the second is a Gaussian type equivalent to a Photoshop Gaussian blur of 1.0
pixels. The Gaussian filter gives a "softer," more aesthetically pleasing look, but the box filter
has computational advantages that I'll discuss later.
Figure 1. The effect of blurring an image with (from left to right) a box filter and
a Photoshop-style Gaussian filter. The kernels of each these filters are given
below.
Doing It Quickly
The blurring algorithm described thus far is simple, but slow. For large images, and for large
kernels, the number of operations rapidly becomes prohibitively large for real-time operation.
The problem is particularly acute when extreme blurring is required; either a small-kernel filter
must be applied many times iteratively, or a filter with a kernel of a size comparable with the
image must be used. That's approximately an n4 operation using the code in Figure 1 -- clearly
no good.
The rest of this article describes a few tricks -- the first two entirely software-related, the latter
two using the power of 3D graphics cards -- which help make real-time blurring possible.
Look again at the box-filtered image in Figure 1, and at the kernel. It's a constant value
everywhere. This can be used to great advantage. The general filtering operation used by
Listing 1 is described by this mathematical expression (you can skip to the end of this section
for the payoff if you don't like math):
Equation 1. Here x,y specifies the coordinate of the destination pixel, s is the source
image, d is the destination image, ker is kernel, and 2k + 1 is the size (in pixels) of
the kernel.
Equation 2. Equation 1 rewritten for kernel with constant value c. Values of c other
than 1 allow the brightness of the destination image to be changed.
However, the costly double sum per destination pixel still remains. This is where some nifty
precomputation comes in handy. The key is to represent the source image in a different way.
Rather than storing the brightness of each pixel in the image, we precompute a version of the
image in which each pixel location holds the total of all the pixels above and to the left of it (see
Figure 3). Mathematically, this is described by Equation 3:
Equation 3. Image p at a point x,y contains the sum of all the source pixels from 0,0
to x,y.
Note that this means that you need to store more than the usual 8 bits per pixel per channel --
the summed brightnesses toward the bottom right of the image can get very large.
Once this precomputation has been completed, the expression for the box-filtering process can
be rewritten entirely in terms of sums starting at 0:
Equation 4. Equation 2 rewritten with sums from 0, where p is the precomputed
image from Equation 3.
Figure 3. The values in the table on the left represent a source image. Each
entry in the table on the right contains the sum of all the source pixels above
and to the left of that position.
This equation gives exactly the same result as the basic box filtering algorithm in Equation 2;
the trick is that each of the double sums in Equation 4 is just a single look-up into the
precomputed image p. This means that the blurring operation for each destination pixel is
reduced to four image look-ups, a few additions and subtractions, and a divide by a constant
value (which can also be turned into a lookup with a simple multiplication table). Even more
significantly, this algorithm's speed is independent of kernel size, meaning that it takes the
same time to blur the image no matter how much blurring is required. Code which implements
this algorithm is given in Listing 2. It's slightly complicated by having to deal with edge cases,
but the core of the algorithm is still simple. Some impressive focusing and defocusing effects
can be achieved with this code alone. It is particularly suited to static images (because you only
have to perform the precomputation step once) such as front-end elements and text/fonts.
The technique just described suffers from one drawback: the fairly costly precomputation step
before any blurring can be performed. This is fine for static images, but for moving ones it
significantly increases the storage and time required to perform a blur.
To solve this, let's consider a horizontal motion blur. That is, blurring an image horizontally
only, rather like Photoshop's Motion Blur filter (see Figure 4). Like Trick One, this can done
quickly and independently of the amount of blur, but this trick requires no precomputation.
Equation 5. The basic equation for a horizontal motion blur with a constant valued
kernel of value c and width 2k + 1.
The heart of the trick to do this quickly lies in keeping a running total of source pixels as you
move across the scan lines of the image. Imagine the situation halfway across an image, say at
horizontal position 23. We've already computed destination pixel 22, so we must have known
the sum of all the source pixels used to compute its color. Destination pixel 23 uses almost
exactly the same source pixels, apart from the ends -- we have to lose one source pixel at the
left edge, and gain one at the right edge. Mathematically, this is shown in Equation 6.
In other words, the new total of source pixels can be adjusted to the value for position 23, by
subtracting the original source pixel at the left edge, and adding on the value at the right edge.
This is illustrated in Figure 5, with the code given in Listing 3.
Figure 5. The sum of the first five values shown is 12. To compute the sum of
the adjacent five values, one merely needs to subtract the blue number and
add the red number to the old sum.
With a little more effort, the same algorithm can be made to motion blur an image in any
direction -- not just horizontally -- by processing the pixels in vertical strips (for vertical
blurring) or even along "skewed scan lines" for blurring at an arbitrary angle. The vertical scan
line routine is particularly useful, because if you motion blur an image twice -- that is, once
horizontally and then filtering the result once vertically -- you achieve exactly the same effect
as a box filter (see Figure 6). Thus, an arbitrary amount of box-filter blur can be achieved, as in
Trick One, in constant time regardless of kernel size, without the need for any precomputation.
Figure 6. (From left to right) The original image, the result of blurring it
horizontally, and the result of blurring that vertically. This final result is
equivalent to a box filtered version of the original image.
Taking Stock
Being able to quickly blur images, both static and moving, is all very well, but thus far I have
concentrated entirely on software techniques, which ignore the power of 3D graphics cards. The
following two tricks allow parts of the scene, or whole scenes rendered on a 3D card, to be
blurred or "defocused" rapidly, by borrowing techniques from MIP-mapping, bilinear filtering,
and the software blurring tricks given above.
The key to both of the following tricks is the idea that a sharp image scaled down and then
blown up with bilinear filtering looks blurry (see Figure 7). Just try walking very close up to a
wall or object in a game that uses a 3D card (but doesn't use any detail mapping) and you'll
know what I mean. The problem of blurring or defocusing is then reduced to producing a small,
good-quality version of the image or object to be blurred, and then "blowing it up" by rendering
a sprite or billboard with that small image texture-mapped onto it.
In order to make this trick work, the engine must render the scene as usual and then generate
MIP-maps of the rendered image -- that is, a series of scaled-down versions of the screen
image each half the size of the last. Thus, if the screen resolution is 800x600, the MIP-mapped
versions will be 400x300, 200x150, 100x75, and so on (how small you go depends on how
much blurring you wish to apply). See Figure 7.
Figure 7. (Clockwise from top left) The original image, its
MIP-maps, and the smallest MIP-map blown up to the
original size using bilinear filtering.
How these MIP-maps are generated depends on the chosen hardware platform and software
API. I won't cover the details here, since it's such a standard technique -- although what we are
going to do with the MIP-maps is less typical.
It may be possible to render the scene to an off-screen texture instead of to the back buffer,
and then use API or hardware code to create the MIP-maps "auto-magically." Or it may be
possible to treat the back buffer (to which you normally render your screen image) as a texture
in itself and use that as a source for automatic MIP-map generation.
In the worst case, if the hardware/software doesn't support MIP-maps automatically, you have
to "lock" the back buffer after 3D rendering is complete, read the pixels in "by hand," and then
generate each smaller MIP-map by averaging groups of pixels. Since the scale factor is always
one half, this isn't as bad as it sounds -- each destination MIP-map pixel is just the average of
the four corresponding source pixels, arranged in a 2x2 square. Listing 4 gives an example of
how to go about this four-to-one averaging, although it's possible to write much faster code
than the one given.
It's worth noting that the generation of the screen MIP-maps is crucial to the final look of the
blur -- in effect, it's replacing the filter kernels of the first two tricks with a repeated averaging
process. It is possible to generate MIP-maps just by throwing away three in four pixels, but the
artifacts when you blow the results up to screen size are generally unacceptable. At the other
extreme, replacing the MIP-map scaling and averaging process with something more exotic
(such as a factor-of-two scale combined with a blur from Trick One or Trick Two) allows some
great special effects, such as "Vaseline-lens" effects or motion-blurred depth of field.
Once the MIP-maps are safely downloaded as textures, the next step is to draw one of them as
a screen-sized billboard over the top of the screen, making sure the graphics card's bilinear (or
better) filtering is enabled. Which MIP-map you choose dictates how blurred the result looks: if
you blow up a tiny MIP-map (say one-eighth the screen resolution or less) the result will be a
very blurred look; if you choose the half-resolution MIP-map, a subtle blur will result.
Of course, it's also possible to change which MIP-map is chosen over time; thus the blur in or
out of focus can be animated. If the fill rate is available, it's also possible to transition smoothly
between MIP-maps by drawing two billboards over the screen with different MIP-maps on them.
For example, rather than using the half-resolution MIP-map and then switching suddenly to the
one-quarter-resolution one, which will cause a visible pop, cross-fade between them: use the
graphics card's alpha-blending features to fade the half-resolution MIP-map out gradually, and
simultaneously fade the one-quarter-resolution one in. Although this is definitely not "correct"
for generating blur levels between those given by the two MIP-maps, it is visually
indistinguishable from an extremely smooth change in the level of blur. I've found that users
are extremely willing to believe that what they are seeing is a smooth blur in or out, and don't
tend to see the transition in MIP-map levels at all.
A Couple of Optimizations
One other notable optimization which is possible on some hardware is that if the entire screen is
to be blurred heavily, the initial rendering phase could be made to actually create an image at
less than the full-screen resolution. This saves on fill rate and means that fewer MIP-maps are
needed. For example, at a screen resolution of 800x600 where you intend to cover the screen
in the one-eighth-resolution (100x75) MIP-map, it may be worthwhile to render the scene at
400x300 in the first place, and then use a one-quarter-resolution MIP-map instead, scaled up to
the full 800x600.
Many other optimizations are possible, many of them dependent on specific hardware tricks. It's
worth experimenting to see what works best for each target platform.
It's all very well to be able to blur the entire screen, but it's quite an extreme effect and so not
always that useful. Sometimes it's desirable to have some elements blurred and leave others
sharp. Focusing, or depth-of-field-type effects, where only objects in the extreme distance or
foreground are blurred, can lend a very impressive photorealistic look to a rendered image.
Sometimes it is possible to partition the scene into non-overlapping layers, and intersperse
rendering and blurring passes to create a different level of blur on each layer. For example, you
could render the background of a scene (the interior of a room for example, perhaps with
unimportant characters standing in it), blur the result using Trick Three as described above, and
then render foreground characters over the top of the blurred result (see Figure 8). This is
surprisingly effective in reducing confusing background clutter, and causes the user's eye to
concentrate on the more important, unblurred characters.
Figure 8. (Clockwise from top left) A background image, a
foreground image, the two together, and finally the result
of blurring the background while leaving the foreground
sharp. (Image by Christian Bravery
© Lionhead Studios Ltd.)
If it's impossible to partition the scene into these non-overlapping layers, the Z-buffer can be
extremely useful. Instead of rendering a single MIP-map on a billboard over the whole screen,
enable Z-buffering and render every MIP-map so that its billboard slices through the scene at a
carefully chosen distance (see Figure 9).
Imagine the scene sliced by five planar billboards, each facing the viewer and each exactly
covering the screen, but spaced evenly along the Z-axis from foreground to background. The
front billboard shows the highest-resolution MIP-map, and each one behind it shows
progressively lower resolution MIP-maps. Because of the action of the Z-buffer, the lowest-
detail MIP-map will only cover (and thus appear to blur) objects behind it, in the extreme
distance; middle-distance objects will be covered by the mid-resolution MIP-maps, and so on up
to the extreme foreground. Objects that lie closer to the eye than the front-most billboard will
be entirely unblurred.
In this way, it is possible to achieve a cheap depth-of-field effect -- provided you have sufficient
fill rate to cover the screen several times. Objects progressively appear to become more blurred
as they move away from the viewer. By reversing the normal sense of the Z-buffer (so that it
allows pixels to be drawn only if the value in the Z-buffer is closer than the new pixel) and
drawing the billboards from front to back, it is possible to reverse the sense of blurring -- so
that extreme foreground objects become blurred, while distant scenery remains untouched.
Artifacts!
As with all the best graphical tricks, there are always downsides, and Trick Three has its fair
share. First of all, it uses a lot of fill rate, especially if you are rendering several Z-buffered
billboards that cover the entire screen. Although very few polygons are involved, the amount of
overdraw can quickly bring the fastest cards to their knees. Also, building the MIP-maps can be
a costly process, especially at high resolution or on architectures where reading back from the
screen and interspersing 2D and operations is costly.
Another drawback is that large objects that pass through several levels of blur (such as
landscapes, roads, or large buildings) can have unsightly hard-edged lines which appear
between levels of blur. This is because there are only a few levels of blur, and the Z-buffering
process will cause a hard edge wherever objects intersect the blurred billboards. This can also
be seen if an object moves slowly toward or away from the viewer, as it gradually cuts through
the billboards. The problem can be made less obvious by using more billboards (levels of blur),
or by reducing the overall level of blur.
And finally, objects in the unblurred areas of the screen may appear to have a blurry halo
around them, caused by their image "bleeding" into the surrounding pixels. These halos actually
lie in a more blurred area, behind the object (see Figure 9). Under certain circumstances, the
effect can actually be used to some advantage as a kind of halo-like special-effect around
objects -- for example, glowing magma or godly halos.
Figure 9. (Clockwise from top left) A scene from Black & White, its MIP-maps,
the five layers formed by the billboards intersecting the Z-buffer, a side view of
the billboards intersecting the scene, and the result of rendering the MIP-maps
onto the billboards over the top of the original scene. (Image © Lionhead Studios Ltd.)
If your game has many small objects spaced over a wide range of depths (or you want a wide
range of blurs on the different objects), the full-screen tricks described so far aren't of much
use. A space combat game, for example, would benefit from only blurring the small ships
individually, rather than wasting time and memory blurring the (mostly black) nebulae in the
background.
This last trick is actually a variation of Trick Three; rather than building MIP-maps of the entire
screen, each object is rendered to its own small texture (that is, a dynamically rendered sprite).
Then, instead of rendering the object itself to the screen, a billboard is drawn in its place,
texture-mapped with the object's sprite. To perform blurring of these objects, MIP-maps are
built of the sprite textures, exactly as in Trick Three. Then the sprite can be rendered with a
MIP-map chosen appropriately for the level of blur.
For example, imagine that a spaceship is visible in the distance and covers a rectangle of screen
area about 128x128 pixels. However, it's going to be heavily blurred. First, we render the ship
to a 128x128 off-screen sprite and build MIP-maps of it down to, say, 16x16. Then, instead of
rendering the original object to the screen, a billboard is drawn in its place using that 16x16
MIP-map. Because of the scaling-down followed by scaling-up process, the ship appears
blurred.
As with Trick Three, if an object is to be drawn heavily blurred, it can be rendered to a sprite at
a low resolution. In our example, you could start by rendering the ship into a 64x64 sprite,
even though it covers a screen area of 128x128 (saving fill rate by a factor of 4), and then
generate a 16x16 MIP-map (as before, but requiring one less iteration of MIP-map generation).
As before, the 16x16 MIP-map is then used to draw the ship to the screen. This trick doesn't
just work for small objects, however; it can be thought of as an extended version of the two-
layer blurring system illustrated in Figure 8.
When using this sprite-based system, there are several points to bear in mind:
z Drawing order becomes important, as is the case when rendering normal alpha'd objects.
z Objects become flat, so it is important they don't intersect each other or form cycles of
overlapping parts between them. In effect, you've lost the use of the Z-buffer between
objects.
z Objects can only have a single level of blur on them; thus an object which covers a wide
range of Z (and thus should be blurred by different amounts at its ends) will be
incorrectly rendered.
Despite these caveats, the technique has a few unique advantages. Because of the MIP-
mapping process, distant objects (even those which aren't meant to be blurred) can be made to
look antialiased, even on cards where full-screen antialiasing is unavailable.
Even better, this trick provides an opportunity for an interesting optimization to the whole
rendering process. It may actually be unnecessary to rerender the object sprites every frame,
because if an object doesn't change orientation or lighting much between frames, its sprite can
be reused in subsequent frames. This can massively increase rendering speeds, especially in
the case of huge numbers of nonspinning objects. With a sufficiently ingenious (or sloppy)
heuristic, which decides when objects can keep their sprites and when they need rerendering,
the sprites may only need to be rendered every four or five frames. The heuristic can be even
more generous (and stop rerendering) when an object is in the distance or blurred. As with LOD
algorithms, the heuristic has to maintain the crucial balance between visual degradation and
speed increase. A rule that works in one situation may be completely inappropriate to another
type of scene or engine architecture. People have been experimenting with techniques like this
for some years, and although it is difficult to balance correctly, under the right circumstances it
not only gives you variable depth-of-field blurring but also huge frame-rate increases.
Rounding Up
Listing 5 gives an extremely simple example of how to use the other code listings given in this
article. It reads an image in Photoshop Raw format and applies each of the algorithms in turn to
output .RAW files, which can be loaded into an image viewer to see the results.
The tricks described in this article are just that -- tricks. There is a huge unexplored area in 3D-
accelerated programming -- that of using the power of graphics cards in ways other than just
drawing lit objects. I hope that the ideas presented in this article inspire some experimentation
and development of even weirder and more wonderful techniques than those I've chosen to
include here.
http://www.gamasutra.com/features/20010209/evans_01.htm
Angel Studios' Midtown Madness 2 for PC and Midnight Club for Playstation 2 are open racing
games in which players have complete freedom to drive where they please. Set in "living cities,"
these games feature interactive entities that include opponents, cops, traffic, and pedestrians.
The role of artificial intelligence is to make the behaviors of these high-level entities convincing
and immersive: opponents must be competitive but not insurmountable. Cops who spot you
breaking the law must diligently try to slow you down or stop you. Vehicles composing ambient
traffic must follow all traffic laws while responding to collisions and other unpredictable
circumstances. And pedestrians must go about their routine business, until you swerve towards
them and provoke them to run for their lives. This article provides a strategy for programmers
who are trying to create AI for open city racing games, which is based on the success of Angel
Studios' implementation of AI in Midtown Madness 2 and Midnight Club. The following
discussion focuses on the autonomous architecture used by each high-level entity in these
games. As gameplay progresses, this autonomy allows each entity to decide for itself how it's
going to react to its immediate circumstances. This approach has the benefit of creating lifelike
behaviors along with some that were never intended, but add to gameplay in surprising ways.
At the highest level, a city is divided into three primary components for the AI map: roads,
intersections, and open areas (see Figure 1). Most of this AI map is composed of roads (line
segments) that connect intersections. For our purposes, an intersection is defined as a 2D area
in which various roads join. Shortcuts are just like roads, except they are overlaid on top of the
three main component types. Shortcuts are used to help the opponents navigate through the
various open areas, which by definition have no visible roads or intersections. Each of these
physical objects is reflected in a software object.
Figure 1. The AI map elements appear as
green and blue line segments for roads
and sidewalks, 2D areas for intersections,
and additional line segments for shortcuts
across open areas.
The road object contains all the data representing a street, in terms of lists of 3D vertices. The
main definition of the road includes the left/right boundary data, the road's centerline, and
orientation vectors defined for each vertex in the definition. Other important road data includes
the traffic lane definitions, the pedestrian sidewalk definition, road segment lengths, and lane
width data. A minimum of four 3D vertices are used to define a road, and each list of vertices
(for example, center vertices, boundary vertices, and so on) has the same number of vertices.
The intersection object contains a pointer to each connected shortcut and road segment. At
initialization, these pointers are sorted in clockwise order. The sorting is necessary for helping
the ambient traffic decide which is the correct road to turn onto when traversing an
intersection. The intersection object also contains a pointer to a "traffic light set" object, which,
as you might guess, is responsible for controlling the light's sequence between green and red.
Other important tasks for this object include obstacle management and stop-sign control.
Big-city solutions: leveraging the City Tool and GenBAI Tool. Angel's method for creating
extremely large cities uses a very sophisticated in-house tool called the City Tool. Not only does
this tool create the physical representation of the city, but it also produces the raw data
necessary for the AI to work. The City Tool allows the regeneration of the city database on a
daily basis. Hence, the city can be customized very quickly to accommodate new gameplay
elements that are discovered in prototyping, and to help resolve any issues that may emerge
with the AI algorithms.
The GenBAI Tool is a separate tool that processes the raw data generated from the City Tool
into the format that the run-time code needs. Other essential tasks that this GenBAI Tool
performs include the creation of the ambient and pedestrian population bubbles and the
correlation of cull rooms (discrete regions of the city) to the components of the road map.
Based on the available AI performance budget and the immense size of the cities, it's
impossible to simulate an entire city at once. The solution is to define a "bubble" that contains a
list of all the road components on the city map that are visible from each cull room in the city,
for the purpose of culling the simulation of traffic and pedestrians beyond a certain distance.
This collection of road components essentially becomes the bubbles for ambient traffic and
pedestrians.
The last function of the GenBAI tool is to create a binary version of the data that allows for
superfast load times, because binary data can be directly mapped into the structures.
Data files: setting up races. The AI for each race event in the game is defined using one of
two files: the city-based AI map data file or the race-based AI map data file. The city file
contains defaults to use for all the necessary AI settings at a city level. Each race event in the
city includes a race-based AI map data file. This race file contains replacement values to use
instead of the city values. This approach turns out to be a powerful design feature, because it
allows the game designer to set defaults at a city level, and then easily override these values
with new settings for each race.
Some examples of what is defined in these files are the number and definition of the race's
opponents, cops, and hook men. Also defined here are the models for the pedestrians and
ambient vehicles to use for a specific race event. Finally, exceptions to the road data can be
included to change the population fill density and speed limits.
Following rails and cubic spline curves. During normal driving conditions, all the ambient
vehicles are positioned and oriented by a 2D spline curve. This curve defines the exact route
the ambient traffic will drive in the XZ-plane. We used Hermite curves because the defining
parameters, the start and end positions, and the directional vectors are easy to calculate and
readily available.
Since the lanes for ambient vehicles on each road are defined by a list of vertices, a road
subsegment can easily be created between each vertex in the list. When the ambient vehicle
moves from one segment to the next, a new spline is calculated to define the path the vehicle
will take. Splines are also used for creating recovery routes back to the main rail data. These
recovery routes are necessary for recovering the path after a collision or a player-avoidance
action sent the ambient vehicle off the rail. Using splines enables the ambient vehicles to drive
smoothly through curves typically made up of many small road segments and intersections.
Setting the road velocity: the need for speed. Each road in the AI map has a speed-limit
parameter for determining how fast ambient vehicles are allowed to drive on that road. In
addition, each ambient vehicle has a random value for determining the amount it will drive over
or under the road's speed limit. This value can be negative or positive to allow the ambient
vehicles to travel at different speeds relative to each other.
When a vehicle needs to accelerate, it uses a randomly selected value between 5 and 8 m/s2.
At other times, when an ambient vehicle needs to decelerate, perhaps because of a stop sign or
red light, then the vehicle calculates a deceleration value based on attaining the desired speed
in 1 second. The deceleration is calculated by
where V is the target velocity, V0 is the current velocity, and (X - X0) is the distance required to
perform the deceleration.
Detecting collisions. With performance times being so critical, each ambient vehicle can't test
all the other ambient vehicles in its obstacle grid cell. As a compromise between speed and
comprehensiveness, each ambient vehicle contains only a pointer to the next ambient vehicle
directly in front of it in the same lane. On each frame, the ambient checks if the distance
between itself and the next ambient vehicle is too close. If it is, the ambient in back will slow
down to the speed of the ambient in front. Later, when the ambient in front becomes far
enough away, the one in back will try to resume a different speed based on the current road's
speed limit.
By itself, this simplification creates a problem with multi-car pileups. The problem can be solved
by stopping the ambient vehicles at the intersections preceding the crash scene.
Crossing the intersection. Once an ambient vehicle reaches the end of a road, it must
traverse an intersection. To do this, each vehicle needs to successfully gain approval from the
following four functional groups.
First, the ambient vehicle must get approval from the intersection governing that road's "traffic
control." Each road entering an intersection contains information that describes the traffic
control for that road. Applicable control types are NoStop, AllwaysStop, TrafficLight, and
StopSign (see Figure 2). If NoStop is set, then the ambient vehicle gets immediate approval to
proceed through the intersection. If AllwaysStop is set, the ambient never gets approval to
enter the intersection. If TrafficLight is set, the ambient is given approval whenever its
direction has a green light. If StopSign is set, the ambient vehicle that has been waiting the
longest time is approved to traverse the intersection.
The second approval group is the accident manager. The accident manager keeps track of all
the ambient vehicles in the intersection and the next upcoming road segment. If there are any
accidents present in these AI map components, then approval to traverse the intersection is
denied. Otherwise, the ambient vehicle is approved and moves on to the third stage.
The third stage requires that the road which the ambient is going to be on after traversing the
intersection has the road capacity to accept the ambient vehicle's entire length, with no part of
the vehicle sticking into the intersection.
The fourth and final approval comes from a check to see if there are any other ambient vehicles
trying to cross at the same time. An example of why this check is necessary is when an ambient
vehicle is turning from a road controlled by a stop sign onto a main road controlled by a traffic
light. Since the approval of the stop sign is based on the wait time at the intersection, the
vehicle that's been waiting longest would have permission to cross the intersection -- but in
reality that vehicle needs to wait until the cars that have been given permission by the traffic
light get out of the way.
Selecting the next road. When an ambient vehicle reaches the end of the intersection, the
next decision the vehicle must make is which direction to take. Depending on its current lane
assignment, the ambient vehicle selects the next road based on the following rules (see Figure
2):
Since the roads are sorted in clockwise order, this simplifies selection of the correct road. For
example, to select the road to the left, just add 1 to the current road's intersection index value
(the ID number of that road in the intersection road array). To pick the straight road, add 2. To
go right, just subtract 1 from the road's intersection index value.
Changing lanes. On roads that are long enough, the ambient vehicles will change lanes in
order to load an equal number of vehicles into each lane of the road. When the vehicle has
traveled to the point that triggers the lane change (usually set at 25 percent of the total road
length), the vehicle will calculate a spline that will take it smoothly from its current lane to the
destination lane.
The difficulty here is in setting the next-vehicle pointer for collision detection. The solution is to
have a next-vehicle pointer for each possible lane of the road. During this state, the vehicle is
assigned to two separate lanes and therefore is actually able to detect collision for both traffic
lanes.
Once a vehicle completes the lane change, it makes another decision as to which road it wants
to turn onto after traversing the upcoming intersection. This decision is necessary because the
vehicle is in a new lane and may not be able to get to the previously selected road from its new
lane assignment.
Orienting the car. As the ambient traffic vehicles drive around the city, they are constantly
driving over an arbitrary set of polygons forming the roads and intersections. One of the
challenges for the AI is orienting the ambient vehicles to match the contour of the road and
surfaces of open areas. Because there are hills, banked road surfaces, curbs separating roads
and sidewalks, and uneven open terrain, the obvious way to orient the vehicles is to shoot a
probe straight down the Y-axis from the front-left, front-right, and rear-left corners of the
vehicle. First, get the XZ position of the vehicle from the calculated spline position and
determine the three corner positions in respect to the center point of the vehicle. Then, shoot
probes at the three corners to get their Y positions.
Once you know the three corner positions, you can calculate the car's orientation vectors. This
approach works very well, but even caching the last polygon isn't fast enough to do all the time
for every car in the traffic bubble. One way to enhance performance is to mark every road as
being either flat or not. If an ambient vehicle drives on a flat road, it doesn't need to do the full
probe method. Instead, this vehicle could use just the Y value from the road's rail data. Another
performance enhancement is to orient the vehicles that are far enough from the player using
only the road's rail-orientation vectors. This approach works well when small vehicle-orientation
pops are not noticeable.
Managing the collision state. When an ambient vehicle collides with the player, or with a
dynamic or static obstacle in the city, the ambient vehicle switches from using a partially
simulated physics model to a fully simulated physics model. The fully simulated model allows
the ambient vehicle to act correctly in collisions.
A vehicle manager controls the activities of all the vehicles transitioning between physics
models. A collision manager handles the collision itself. For example, once a vehicle has come
to rest, the vehicle manager resets it back to the partially simulated physics model. At this
point, the ambient vehicle attempts to plot a spline back to the road rail. As it proceeds along
the rail, the vehicle will not perform any obstacle detection, and will collide with anything in its
way. A collision then sends the vehicle back to the collision manager. This loop will repeat for a
definable number of tries. If the maximum number of tries is reached, the ambient vehicle
gives up and remains in its current location until the population manager places it back into the
active bubble of the ambient vehicle pool.
Using an obstacle-avoidance grid. Every AI entity in the game is assigned to a cell in the
obstacle-avoidance grid. This assignment allows fully simulated physics vehicles to perform
faster obstacle avoidance.
Since the road is defined by a list of vertices, these vertices make natural separation points
between obstacle-avoidance buckets. Together, these buckets divide the city into a grid that
limits the scope of collision detection. As an ambient vehicle moves along its rail, crossing a
boundary between buckets causes the vehicle to be removed from the previous bucket and
added to the new bucket. The intersection is also considered an obstacle bucket.
Simulation bubbles for ambient traffic. A run-time parameter specifies the total number of
ambient vehicles to create in the city. After being created, each ambient vehicle is placed into
an ambient pool from which the ambients around the player are populated. This fully simulated
region around the player is the simulation bubble. Relative to the locations of the player,
remote regions of the city are outside of the simulation bubble, and are not fully simulated.
When a player moves from one cull room to another, the population manager compares the
vertex list of the new cull room against the list for the old one. From these two lists, three new
lists are created: New Roads, Obsolete Roads, and No Change Roads. First, the obsolete roads
are removed from the active road list, and the ambient vehicles on them are placed into the
ambient pool. Next, the new roads are populated with a vehicle density equal to the total
vehicle length divided by the total road length. The vehicle density value is set to the default
value based on the road type, or an exception value set through the definition of the race AI
map file.
As the ambient vehicles randomly drive around the city, they sometimes come to the edge of
the simulation bubble. When this happens, the ambient vehicles have two choices. First, if the
road type is two-way (that is, ambient vehicles can drive in both directions), then the vehicle is
repositioned at the beginning of the current road's opposite direction. Alternatively, if the
ambient vehicle reaches the end of a one-way road, the vehicle is removed from the road and
placed into the pool and thereby becomes available to populate other bubbles.
Driving in London: left becomes right. London drivers use the left side of the road instead
of the right. To accommodate this situation, some changes have to be made to the raw road
data. First, all of the right lane data must be copied to the left lane data, and vice versa. The
order of each lane's vertex data must then be reversed so that the first vertex becomes the
last, and the lane order reversed so that what was the lane closest to the road's centerline
becomes the lane farthest from the center.
Given these changes, the rest of the AI entities and the ambient vehicle logic will work the
same regardless of which side of the road the traffic drives on. This architecture gave us the
flexibility to allow left- or right-side driving in any city.
In real cities, pedestrians are on nearly every street corner. They walk and go about their
business, so it should be no different in the cities we create in our games. The pedestrians
wander along the sidewalks and sometimes cross streets. They avoid static obstacles such as
mailboxes, streetlights, and parking meters, and also dynamic obstacles such as other
pedestrians and the vehicles controlled by the players. And no, players can't run over the
pedestrians, or get points for trying! Even so, interacting with these "peds" makes the player's
experience as a city driver much more realistic and immersive.
Simulation bubbles for pedestrians. Just as the ambient traffic has a simulation bubble, so
do the pedestrians. And while the pedestrian bubble has a much smaller radius, both types are
handled similarly. During initialization, the pedestrians are created and inserted into the
pedestrian pool. When the player is inserted into the city, the pedestrians are populated around
him. During population, one pedestrian is added to each road in the bubble, round-robin style,
until all the pedestrians in the pool are exhausted.
Pedestrians are initialized with a random road distance and side distance based on an offset to
the center of the sidewalk. They are also assigned a direction in which to travel and a side of
the street on which to start. As the pedestrians get to the edge of the population bubble, they
simply turn around and walk back in the opposite direction from which they came.
Wandering the city. When walking the streets, the pedestrians use splines to smooth out the
angles created by the road subsegments. All the spline calculations are done in 2D to increase
the performance of the pedestrians. The Y value for the splines is calculated by probing the
polygon the pedestrian is walking on in order to give the appearance that the pedestrian is
actually walking on the terrain underneath its feet.
Each pedestrian has a target point for it to head toward. This target point is calculated by
solving for the location on the spline path three meters ahead of the pedestrian. In walking, the
ped will turn toward the target point a little bit each frame, while moving forward and sideways
at a rate based on the parameters that control the animation speed. As the pedestrian walks
down the road, the ped object calculates a new spline every time it passes a sidewalk vertex.
Crossing the street. When a pedestrian gets to the end of the street, it has a decision to
make. The ped either follows the sidewalk to the next street or crosses the street. If the ped
decides to cross the street, then it must decide which street to cross: the current or the next.
Four states control ped navigation on the streets: Wander, PreCrossStreet,
WaitToCrossStreet, and CrossStreet (see Figure 3). The first of these, Wander, is described
in the previous section, "Wandering the City." PreCrossStreet takes the pedestrian from the
end of the street to a position closer to the street curb, WaitToCrossStreet tells the pedestrian
waiting for the traffic light that it's time to cross the street, and CrossStreet handles the actual
walking or running of the pedestrian to the curb on the other side of the street.
Animating actions. The core animation system for the pedestrians is skeleton-based.
Specifically, animations are created in 3D Studio Max at 30FPS, and then downloaded using
Angel's proprietary exporter. The animation system accounts for the nonconstant nature of the
frame rate.
For each type of pedestrian model, a data file identifies the animation sequences. Since all the
translation information is removed from the animations, the data file also specifies the amount
of translation necessary in the forward and sideways directions. To move the pedestrian, the
ped object simply adds the total distance multiplied by the frame time for both the forward and
sideways directions. (Most animation sequences have zero side-to-side movement.)
Two functions of the animation system are particularly useful. The Start function immediately
starts the animation sequence specified as a parameter to the function, and the Schedule
function starts the desired animation sequence as soon as the current sequence finishes.
Avoiding the speeding player. The main rule for the pedestrians is to always avoid being hit.
We accomplish this in two ways. First, if the pedestrian is near a wall, then the ped runs to the
wall, puts its back against it, and stands flush up against it until the threatening vehicle moves
away (see Figure 4).
Alternatively, if no wall is nearby, the ped turns to face the oncoming vehicle, waits until the
vehicle is close enough, and then dives to the left or right at the very last moment (see Figure
5).
Figure 5. The pink lines visualize the
direction the peds intend to walk. When a
player vehicle introduces a threat, the
pedestrians decide to dive right or left at
the last moment, since no wall is nearby.
The pedestrian object determines that an oncoming vehicle is a threat by taking the forward
directional vector of the vehicle and performing a dot product with the vector defined by the
ped's position minus the vehicle's position. This calculation measures the side distance. If the
side distance is less than half the width of the vehicle, then a collision is imminent.
The next calculation is the time it will take the approaching vehicle to collide with the
pedestrian. In this context, two distance zones are defined: a far and a near. In the far zone,
the pedestrian turns to face the vehicle and then goes into an "anticipate" behavior, which
results in a choice between shaking with fear and running away. The near zone activates the
"avoid" behavior, which causes the pedestrian to look for a wall to hug. To locate a wall, the
pedestrian object shoots a probe perpendicular to the sidewalk for ten meters from its current
location. If a wall is found, the pedestrian runs to it. Otherwise, the ped dives in the opposite
direction of the vehicle's rotational momentum. (Sometimes the vehicle is going so fast, a
superhuman boost in dive speed is needed to avoid a collision.)
Avoiding obstacles. As the pedestrians walk blissfully down the street, they come to obstacles
in the road. The obstacles fall into one of three categories: other wandering pedestrians; props
such as trash cans, mailboxes, and streetlights; or the player's vehicle parked on the sidewalk.
In order to avoid other pedestrians, each ped checks all the pedestrians inside its obstacle grid
cell. To detect a collision among this group, the ped performs a couple of calculations. First, it
determines the side distance from the centerline of the sidewalk to itself and the other
pedestrian. The ped's radius is then added to and subtracted from this distance. A collision is
imminent if there is any overlap between the two pedestrians.
In order to help them avoid each other, one of the pedestrians can stop while the other one
passes. One way to do this is to make the pedestrian with the lowest identification number
stop, and the latter ped sets its target point far enough to left or right to miss the former ped.
The ped will always choose left if it's within the sidewalk boundary; otherwise it will go to the
right. If the right target point is also past the edge of the sidewalk, then the pedestrian will turn
around and continue on its way. Similar calculations to pedestrian detection and avoidance are
performed to detect and avoid the props and the player's vehicle.
The full physics simulation object, VehiclePhysics, is a base class with the logic for navigating
the city. The different entities in the city are derived from this base class, including the
RouteRacer object (some of the opponents) and the PoliceOfficer object (cops). These child
classes supply the additional logic necessary for performing higher-level behaviors. We use the
term "full-physics vehicles" because the car being controlled for this category behaves within
the laws of physics. These cars have code for simulating the engine, transmission, and wheels,
and are controlled by setting values for steering, brake, and throttle. Additionally, the
VehiclePhysics class contains two key public methods, RegisterRoute and DriveRoute.
Registering a route. The first thing that the navigation algorithm needs is a route. The route
can either be created dynamically in real time or defined in a file as a list of intersection IDs.
The real-time method always returns the shortest route. The file method is created by the Race
Editor, another proprietary in-house tool that allows the game designer to look down on the city
in 2D and select the intersections that make up the route. The game designer can thereby
create very specific routes for opponents. Also, the file method eliminates the need for some of
the AI entities to calculate their routes in real time, which in turn saves processing time.
Planning the route. Once a route to a final destination has been specified, a little bit more
detailed planning is needed for handling immediate situations. We used a road cache for this
purpose, which stores the most immediate three roads the vehicle is on or needs to drive down
next (see Figure 6).
At any given moment, the vehicle knows the next intersection it is trying to get to (the
immediate target), so the vehicle can identify the road connecting this target intersection with
the intersection immediately before the target. If the vehicle is already on this "hint road," then
the cache is filled with the hint road and the next two roads in the route.
If the vehicle isn't on the hint road, it has gotten knocked off course. In this situation, the
vehicle looks at all the roads that connect with the intersection immediately before the target. If
the vehicle is on one of these roads, then the cache is filled with this road and the next two
roads the vehicle needs to take in order to get back on track. If the vehicle isn't on any of these
roads, then it dynamically plots a new route to the target intersection.
Determining multiple routes. If there are no ambient vehicles in the city, then there is only
one route necessary to give to an opponent (the computer-controlled player, or CCP), the best
route. In general, however, there is ambient traffic everywhere that must be avoided if the CCP
is to remain competitive. The choice then becomes which path to pick to avoid the obstacles. At
any given moment, this choice comes down to going left or right to avoid an upcoming obstacle.
As the CCP plans ahead, it determines two additional routes for each and every obstacle, until it
reaches the required planning distance. This process produces a tree of routes to choose from
(see Figure 7).
Choosing the best route. When all the possible routes have been enumerated, the best route
for the CCP can be determined. Sometimes one or more of the routes will take the vehicle onto
the sidewalk. Taking the sidewalk is a negative, so these routes are less attractive than those
which stay on the road. Also, some routes will become completely blocked, with no way around
the obstacles present, making those less attractive as well. The last criterion is minimizing the
amount of turning required to drive a path. Taking all these criteria into account, the best route
is usually the one that isn't blocked, stays on the road, and goes as straight as possible.
Setting the steering. The CCP vehicle simulated with full physics uses the same driving model
that the player's vehicle uses. For example, both vehicles take a steering parameter between -
1.0 and 1.0. This parameter is input from the control pad for the player's vehicle, but the CCP
must calculate its steering parameter in real time to avoid obstacles and reach its final
destination. Rather than planning its entire route in advance, the CCP simplifies the problem by
calculating a series of Steering Target Points (STPs), one per frame in real time as gameplay
progresses. Each STP is simply the next point the CCP needs to steer towards to get one frame
closer to its final destination. Each point is calculated with due consideration to navigating the
road, navigating sharp turns, and avoiding obstacles.
Setting the throttle. Most of the time a CCP wants to go as fast as possible. There are two
exceptions to this rule: traversing sharp turns and reaching the end of a race. Sharp turns are
defined as those in which the angle between two road subsegments is greater than 45 degrees,
and can occur anywhere along the road or when traversing an intersection. Since the route
through a sharp turn is circular, it is easy to calculate the maximum velocity through the turn
by the formula
where V is equal to the velocity, u is the coefficient of friction for the road surface, g is the
value of gravity, and R is the radius of our turn. Once the velocity is known, all that the CCP
has to do is slow down to the correct speed before entering the turn.
Getting stuck. Unfortunately, even the best CCP can occasionally get stuck, just like the player
does. When a CCP gets stuck, it throws its car into reverse, realigns with the road target, and
then goes back into drive and resumes the race.
In the wake of the original Midtown Madness, we wanted open city racing to give players much
more than the ability to drive on any street and across any open area. In order for a city to feel
and play in the most immersive and fun way possible, many interactive entities of real cities
need to be simulated convincingly. These entities include racing opponents, tenacious cops,
ambient traffic, and pedestrians, all of which require powerful and adaptive AI to bring them to
life. Midtown Madness 2 and Midnight Club expand on the capabilities of these entities, which in
turn raises the bar of players' expectations even further.
The future of open city racing is wide open -- literally. Angel Studios and I are planning even
more enhancements to the AI in any future games of this type that we do. Some ideas I'm
planning to investigate in the future include enhancing the opponent navigation skills of all AI
entities, and creating AI opponents that learn from the players. Additionally, I'd like to create
more player interaction with the city pedestrians, and have more interaction between AI
entities. Anyone wanna race?
Joe Azdima has been an AI programmer at Angel Studios for three years. During that time, he architected and implemented
the entire AI system for Midtown Madness 1 and 2 for PC and Midnight Club for Playstation 2. Joe thanks Robert Bacon, Angel
Studios' technical writer, for the exceptional editorial efforts Robert has applied to this article.
http://www.gamasutra.com/features/20010124/adzima_01.htm
Game programming, because of its history, has an image that no longer fits with the reality of
the job. The popular view of game programmers, at least in nerdy circles, is of a bunch of code
cowboys coding six impossible things before breakfast, spending most of their time worrying
about how to squeeze those last few cycles out of that inner loop. Micro-optimization and
glorious hacks are still very important in some areas of the job, but, as can be witnessed by the
ill fate of some recent high-profile projects, the biggest challenge is actually getting the damn
game shipped. The successful companies are those that have adapted, and have strategies for
getting the projects done.
There's a lot of software engineering literature already out there, and I've mentioned some of
my favorites in my list of resources at the end of this article. In the course of my work, I've run
across some practices and patterns that I've found useful. Some are quite specific to the unique
challenges of creating games, such as the fact that you almost never get what software
engineers would consider an adequate list of requirements. Others are not talked about in polite
circles because they're treating the symptoms of a bad coding process rather than the causes. I
think that these are still worth knowing, because sometimes trying to minimize the damage is
the best thing to do.
My main aim with this article is to share some of the ways I've found to keep my code usable in
an imperfect environment. The real solution is of course an overhaul of the whole development
process, but for the times that isn't possible, I hope these guidelines will be of some help.
"Are the identifiers in the code clear and fitting?" is a useful rule of thumb for determining
whether the code I'm writing is going to be maintainable. Since the compiler ignores the names
when it generates the final machine code, it's common for naming to be seen as a cosmetic
detail. However, I believe the quality of the names is a good indicator of how well thought out
the code's design was. Fundamentally, you should be able to describe to someone else precisely
what your code does, or you might as well have written it in binary machine code. If you can't
describe what it does, then neither you nor anyone else will be able to work with it in the
future, which leaves it useless. If you can describe it, then the best way to keep that description
for future need is to bake it into the code, by using clear, meaningful identifiers.
This is not about whether you want Hungarian notation, or prefer verbose or snappy variable
names, it's about whether they convey any information or just muddy the waters. Seeing a long
list of "x"s, "foo"s, or swear words -- or more commonly, very vague and ambiguous nouns --
implies that the writer of the code is confused about what he or she is writing. If I'm having
trouble with naming, I revisit my code's design, because I obviously don't have the concepts
involved clear enough in my head.
An especially important case is function and method names. A function with a name that's
unclear or misleading will waste a lot of debugging and maintenance time. ("Of course
Player.Render() polls the keyboard, why didn't I think of that earlier?")
See Listings 1.1 and 1.2 for a practical demonstration of how I'd approach a simple function.
Classes packed with members and functions packed with variables both worry me, for pretty
much the same reason; nobody ever designs one of these crawling horrors, they just happen.
Usually things were added to them a little bit at a time as short-term hacks to get something
working, often under pressure to get a feature or bug fixed for a deadline. Once in, they never
get removed, other changes to the code start to rely on them being there, and ultimately
people take a more relaxed view of hacking the code around some more, since it's a mess
already.
The example that sticks most in my mind was a camera class that ended up with responsibilities
for player input, the player's on-screen character, informative displays, and plenty of other
stuff. The code limped along, though debugging was made a nightmare by the convoluted and
nonintuitive paths that the code took through this beast of a class. The real crunch hit when
camera changes were needed. The camera class was at that point so closely tied to everywhere
else in the game that changing it cascaded bugs through the whole system.
The textbook answer to this sort of problem is to revisit the whole design and figure out how to
incorporate the changes cleanly. If, as usual, there's deadline pressure, I now prefer to put in
placeholder non-member functions that are as loosely coupled to the rest of the code as
possible if a 'home' for the functionality isn't immediately obvious. This is a far from perfect
solution, but at least contains the hackiness rather than infecting a whole class, and makes it
obvious that the code is a hack rather than hiding it away.
Listings 2.1 and 2.2 show one way I might tackle code that was threatening to grow into a
crawling horror.
When I learn about a new feature, I naturally find lots of places in the code I'm writing where it
could come in handy. The problem is, having little experience with the new feature, I'll
inevitably end up using it where it will cause problems further down the line. Trying to debug an
object whose inheritance tree looks like a cat's cradle is not fun, exceptions can be equally
obscure unless used with care, and working out how templates work on different type
parameters can be nearly impossible. Another consideration is whether the next person to work
with the code will understand the feature you're using. When you definitely need them,
inheritance et al. are lifesavers, but they come with costs too, and can be used to obscure the
code far more easily than to make it clearer. A good coding standard will help, but thinking,
"Can I use a simpler way?" rather than, "Can I use <feature> here?" is most of the battle.
I find this also applies to using new technology. There's a lot of kudos to be gained by learning
to use a new technique before your peers, but I have to control my enthusiasm before I decide
how to approach a problem that could be helped with a new technique. It's vital to be critical
and look at the costs and benefits of using the technology as opposed to going with a less sexy
but better-understood technique. A good example is Tom Forsyth's presentation looking at
subdivision surfaces and asking what they'll bring to a game. With most graphical techniques
the main cost is not implementing them in the engine, it's giving creative control to the artists
by giving them the tools to create wonderful effects with them.
Over-general Designs
In game programming, you almost never get handed a complete set of requirements. One way
of dealing with this is to attempt to write a very general system that will cope with a very wide
range of needs. Though lecturers are very fond of such top-down, abstract systems as
examples, I've found that the design of the system has to be aware of the specifics of how it
will be used, or the code that relies on it ends up working around the system rather than being
helped by it. The heart of the problem is that there's always a choice of which abstractions best
model the problem, and until you know the problem well, picking the right abstractions is
impossible.
With input, picking an abstraction that encompasses joypads, mice, joysticks, and keyboards --
and isn't awkward to use -- is not something I've ever succeeded at. Even though they belong
to a family of devices with similar purposes, finding a useful subset of properties that they have
in common is very hard.
The way I've found around such problems is to tackle the problem from the other end and work
through some examples of how I think the system is likely to be used. This focuses my mind on
what the common cases will be; since they're the most important, I can make sure the design
concentrates on those. It's a lot easier to spot potential problems at an early stage, rather than
realizing them after it's all been implemented, causing you to have to corrupt the abstract
design by hacking in workarounds.
Take the case of input. I might see two main needs, one for pretty undemanding tasks such as
menu control, the other for things such as control of the main character where all the
information you can get from the control device is essential. In the first case, a simple abstract
system that posted events to interested parties when a major input change happened would
probably do everything that was needed. For the latter, I'd expose the full interface to the
device with minimal wrapping, and let the calling code pull out the information it wanted. Trying
to cope with the radically different needs of the client code within the same system would mean
both would be served poorly.
Listings 3.1 and 3.2 take a look at how to design the interface to a graphics module, and how
tricky it can be to try to make it general.
The real competition in games these days is no longer in the graphical technology, but in the
content. To give the artists and designers as much time as possible to get all that great content
in, we need to get a version of the game they can use very early on in the project. This kind of
rapid development isn't common in games; in the past the priority has been on writing
extremely efficient code, with the time taken to write it not as important. Now the emphasis is
increasingly on getting something working at all, and then worrying about efficiency where it's
important. This doesn't mean you can be careless writing code, of course, just that the varying
times that different ways of coding something would take are now much more important than in
the past. It's hard to adjust to writing very naïve code where needed when every instinct is to
always have efficiency as your highest priority.
Lying Comments
When I first started writing commercial game code, my code was liberally littered with
comments, and I couldn't imagine any drawbacks to this. As time passed, I noticed something
odd: the code and the comments grew increasingly out of sync, and I found that wrong
comments cost me more time than correct comments saved me. The cutting and pasting, and
late night alterations that happen when the pressure's on all meant that the code changed while
the comments didn't.
I find code in which code and comments seem to clash very difficult to sort out, largely because
the comments have to be chased up just in case they're true, even if they seem obviously
wrong. The problem is that there's no natural reason for the comments and code to stay
connected, and in the heat of development the time needed to maintain the comments is
seldom available. To minimize the problem, my style is now to have a bare minimum of
comments but to have my code very verbose, with all steps of the algorithms laid out in simple
steps and with clear names for everything throughout. I reserve comments for areas where I'm
doing something that's unusual or obscure, in the hope that their scarcity will make them stand
out, and thus be less likely to be forgotten when the code changes. Obviously there are
exceptions to this, such as in assembler where the language doesn't give you the chance to be
verbose, and so comments are essential; you just have to bite the bullet and spend time
maintaining them.
Conclusion
I've tried to apply the optimization mantra to my coding process and look at what problems
consume most of my coding time. They are pretty mundane, even obvious, but if I can save
even a little time by focusing on them, it'll mean we can get the game out faster. I know this is
a bit of a pipe dream, but have you ever thought how nice it would be to actually implement
something from the "if we have time" wish list?
As long as I'm bugged by how much cooler the games I've worked on could have been if we'd
only had a little more time, I'll keep trying to figure out how to improve my coding.
Resources
Web Sites
http://www.refactoring.com/ has lots more ideas on how to rescue code that's in trouble.
Books
Gamma, Erich, and others. Design Patterns. Reading, Mass.: Addison-Wesley, 1995 (ISBN
0201633612).
Lakos, John S. Large-scale C++ Software Design. Reading, Mass.: Addison-Wesley, 1996 (ISBN
0201633620).
McConnell, Steve C. Software Project Survival Guide. Redmond, Wash.: Microsoft Press, 1997
(ISBN 1572316217).
McConnell, Steve C. After the Gold Rush: Creating a True Profession of Software Engineering.
Redmond, Wash.: Microsoft Press, 1997 (ISBN 0735608776).
Meyers, Scott. More Effective C++: 35 New Ways to Improve Your Programs and Designs.
Reading, Mass.: Addison-Wesley, 1995 (ISBN: 020163371X).
Sutter, Herb. Exceptional C++: 47 Engineering Puzzles, Programming Problems, and Solutions.
Reading, Mass.: Addison-Wesley, 1999 (ISBN: 0201615622).
// Example 1.1
// Original
int IsFree(int Num, int Mode)
{
int RangeX, RangeY;
for(int i = 1; i <= g_MaxUnit; i++)
{
if(i != Num)
{
switch(Mode)
{
case NX_NEAR_ALL:
RangeX = g_Units[Num].Width + g_Units[i].Width;
RangeY = g_Units[Num].Height + g_Units[i].Height;
break;
case NX_NEAR_ENEMY:
case NX_NEAR_ENEMYDANGER:
RangeX = g_Units[i].Range + g_Units[i].Width + g_Units
[Num].Width;
RangeY = g_Units[i].Range + g_Units[i].Height + g_Units
[Num].Height;
break;
default:
assert(false);
break;
}
if((abs(g_Units[i].X - g_Units[Num].X) < RangeX) &&
(abs(g_Units[i].Y - g_Units[Num].Y) < RangeY))
{
if(Mode == NX_NEAR_ALL) return i;
if((Mode == NX_NEAR_ENEMY) &&
(g_Units[i].Civilization] != g_Units[num].Civilization))
return i;
if((Mode == NX_NEAR_ENEMYDANGER) &&
(g_Units[i].Civilization != g_Units[num].Civilization)
&&
(g_Units[i].Force > 0)) return i;
}
}
}
return 0;
}
// Example 1.2
// Version with meaningful names added
int GetNearUnit(int MyUnitIndex, int AcceptableType)
{
int RangeX, RangeY;
const SUnit& MyUnit=g_Units[MyUnitIndex];
for(int OtherUnitIndex = 1; OtherUnitIndex <= g_CurrentUnitCount;
OtherUnitIndex++)
{
const SUnit& OtherUnit=g_Units[OtherUnitIndex];
if(MyUnitIndex != OtherUnitIndex)
{
switch(AcceptableType)
{
case ANY_UNIT:
RangeX = MyUnit.Width + OtherUnit.Width;
RangeY = MyUnit.Height + OtherUnit.Height;
break;
case ENEMY_UNIT:
case DANGEROUS_ENEMY_UNIT:
RangeX = OtherUnit.Range + MyUnit.Width + MyUnit.Width;
RangeY = OtherUnit.Range + MyUnit.Height +
MyUnit.Height;
break;
default:
assert(false);
break;
}
// Example 2.1
// Bad
void
ProcessEverything(void) {
g_Camera.Process();
void
CCamera::Process(void) {
g_Player.rotY+=SomeMemberVariableThatGetsTheEffectWeWant;
// lots more code
// Example 2.2
// Still bad, but a lot more visible!
void
ProcessEverything(void) {
g_Camera.Process();
void
CCamera::HACK_AlterPlayersYRotation(void) {
g_Player.rotY+=SomeMemberVariableThatGetsTheEffectWeWant ;
}
void
CCamera::Process(void) {
// Example 3.1
// Classic example of problems
class C3DTriangle {
// some data
void Draw(void);
};
// Example 3.2
// Better, but still probably unacceptable for games
class C3DTriangle {
// some data
};
class C3DTriangleRenderer {
void DrawTriangleList(C3DTriangle* const pList,int nTriangleCount);
};
// This interface is a lot better, but is still a bad fit for a lot of rendering
architectures
// The idea of a design like this is to hide platform specifics, but once you
start altering
// the interface to fit the platforms characteristics as you'd have to here,
you've lost any
// platform independence. All you end up with is an interface that apes a
particular API, and
// so doesn't work well on any other platforms, and just adds an extra layer of
complexity to
// the program to no good effect. Wrapping up an API doesn't give you platform
independence!
Making a new build of a game in development is a very stressful task. It requires great care and
concentration for several hours, on work which is essentially tedious. Particularly towards the
end of a project, the slightest mistake could be disastrous. Because of the demands of
marketing, testing, management and the publisher, it is often carried out under high pressure.
Computers are particularly good at automating repetitive tasks, and carry them out more
quickly and more reliably than humans can. Unfortunately, because this is often hard to do in
Windows, people find it easier to carry on performing tasks manually using the GUI.
Welcome to the relaxing new world of automated build processes. You can make the process of
turning your game assets into an actual CD image or Internet download as easy as compiling a
new executable during debugging.
During development of Creatures 3, we used Unix scripting tools under Windows to write an
automatic build process. It gathered all the work of the team, compiled, processed and tested it
to produce a final CD image. This was very successful. We are using the same system to similar
advantage in current projects.
Key Advantages
There are several ways that having a build system will help your project.
z More reliable. The final builds were of a higher quality than manually made builds. This
is because the build script never forgets to do something. In addition, automatic testing
during the build process ensures the most obvious errors are caught before the build is
finished.
z Save developer time. Instead of having one engineer working full time to make a build,
you only need someone to start the build going, and organize fixing of any errors that
show up. That one engineer can then fix bugs, or do something else useful.
z Risk reducing. If the developer who normally makes builds is ill, goes on holiday, or
somewhere more disreputable, it is easier for other people to make builds. Even if there
are problems, there is a clear script describing every stage of the process, so it is possible
to work out what went wrong. The knowledge in the mind of the developer is captured in
script code, rather than in his head.
z Faster. If things went smoothly, we made a new build in half an hour. This isn't hugely
faster than a smooth manual build with an attentive developer. However, the time taken
is much more consistent. And the build can be made overnight, or while at lunch
z Instant availability for testing. The build process empowers your QA department by
emailing them to let them know as soon as a new build is available. They can start testing
it straight away, or have it waiting for them fresh off the cooker when they arrive earlier
than you in the morning
Despite all these improvements, making builds can still be stressful! The person responsible for
builds still needs to check that everyone in the team is ready for the new build to be made.
Then they set the build going, check for any errors and fix them or arrange for them to be
fixed. They also need to maintain the build scripts.
Figure 1 presents a brief overview of the stages that you need in your build process. You can
use this as an overall guide to writing your build script. The rest of the article gives lots of
details for technical implementation.
Figure 1
1. Compiling executables. The first thing for a script to do is to grab any C/C++ source
code from the version control system, and use a compiler to automatically make a release
mode version of your game engine. In the process it can upgrade the version number and
tag it in source control for future reference. Then you can always get the exact code back
again for future debugging.
2. Gathering files. Next it is time to muster all the other assets. This means graphics,
sounds, music, scripting code, level designs and video. These can be retrieved from your
version control system or from a fixed place on the network. We do both, keeping many
assets in a "build template" folder which is a skeletal version of the installed image.
3. Localization. If your game is localized then you can customize the build, for example by
copying the correct files depending on the language. You can also add or remove bundled
adverts (e.g. an AOL installer) and change logos according to your publisher and market.
4. File processing. This is a good moment to carry out processing on the files. Graphics
can be converted and optimized, and level files compiled. In Creatures 3 the build script
ran our own tools to splice Norn genomes together, making the lives of our genetic
engineers that bit easier.
5. Automated checks. As you get more used to build scripts, you will find it useful to add
code to check for common errors. For example, scanning scripts to detect missing
graphics files, or for common coding errors.
6. Full game testing. The next stage of the script is particularly crucial for increasing build
quality. The build machine copies the game image as if it had been installed, and then
launches it into a rolling demo or self-play mode. In Creatures 3 we hatched a few Norns
and let them play in the world. The build script detects any error messages and if it finds
them it abandons the build.
7. Installation compiling. Finally the build script calls the installation program (e.g.
InstallShield) to create the installer from the game image, and copies the completed CD
image to the network.
8. Email notification. The team members and QA are emailed to let them know it is ready.
As a final flourish, you can even get the script to cut a physical CD and eject it from the
drive!
Implementing It Yourself
Creatures 3's build process uses Unix tools running under Windows. These are free tools,
licensed under the GPL, and ported to Windows by Cygnus.
The main tool is the Bash (Born Again SHell), which is a popular Unix command line shell. It has
a powerful scripting language, which provides all the logic conditions and loops that you need
and much more. It calls the standard Unix commands for file copying and moving.
The second important tool is Perl, the famous "glue" language. You can easily do complex text
file manipulation with Perl.
We used Unix tools for three key reasons. They've got a very rich feature set, like DOS on
steroids. Once you know how to, you can do anything with them. Secondly, the skill is portable.
We've used the same skills to administer Unix-based servers running game web sites, and they
may help us to make best use of the PS2 development kits. The third reason? They're free and
open. We could play with them straight away without a purchase order.
There aren't any very strong alternatives to the Unix tools for scripting. The Windows command
line is impotent, and commercial alternatives like 4DOS/TakeCommand/MKS gain you nothing
for their price.
More recently, the Windows Scripting Host has got a lot better, and you might want to try it
out. Find out more at http://msdn.microsoft.com/scripting/. You can do file copying using the
FileSystemObject.
The machine that you run the build on is quite important. The Creatures 3 build process ran on
a dedicated computer, which was of high specification at the time. This stops it interfering with
the developer's main machine, and is also useful for soak tests, and other tests running in the
debugger. We use Windows NT or 2000 for build machines. They are much more stable than
Windows 95/98, particularly for running command line programs.
Most teams store at least some assets in a version control system, so it is essential for the build
script to be able to retrieve data from them. At Creature Labs we used Microsoft's SourceSafe
for development of Creatures 3. Since then we have moved on to the excellent CVS. This was
because we've found it to be more reliable, it's portable (I've transfered game web site code
between Windows running IIS and Linux running Apache) and it's free.
If you know how to use CVS from the command line then you will find it very easy to call from
your build process. SourceSafe is a little trickier. You have to set up some environment
variables which aren't very well documented.
SSUSER is the user name in SourceSafe. We have a special user called "cbuilder" which is the
build script. There's also a Windows network user with the same name. This way anyone can be
taught to log into the build machine as "cbuilder", and network drive permissions won't cause
trouble for the script.
With your environment set up, it is relatively easy to call SourceSafe. Have a look at the
documentation, and these example commands.
The exclusive checkout nature of SourceSafe does causes a few problems. You need to make
sure the script undoes any checkouts if there is an error. Alternatively the script can try to undo
checkouts as the first thing it does. That way it recovers from previous aborted runs.
You should make your script automatically tag your code and assets with the build or version
number. When you do this make sure you are tagging the actual version that the script is using,
even if somebody else changes files while the script is running.
For example, with SourceSafe you need totag a version first. Then do "ss Get" with that label,
as in the example line of code above. This guarantees you are compiling exactly the labelled
version.
To use CVS from the command line, you don't need to do anything special. It comes included
with the latest version of Cygwin. Partly as a code example, here is a function which ensures
the user has logged in (using pserver protocol) on the build machine.
fi
cat "$HOME/.cvspass" | grep "$CVSROOT" >nul
if [ $? == 1 ]; then
elif [ $? == 2 ]; then
error grep
fi
Tagging a version in CVS is much easier. If you check out a particular version and tag it, then
the tags apply to the version that you checked out. This is in contrast to SourceSafe where the
tags apply to the latest version in the repository.
It's very handy to be able to automatically make a new version of your game engine. It's often
useful to do this between full builds, so level designers or artists can get working with new
features.
Before compiling you need to fetch the code from source control. It's useful at this point to
automatically increase the version number. The Creatures 3 game engine has a hard-coded
version number. The script updated this version by editing a header file, and checking it back
into source control.
According to your compiler, there will be some way of building your project from the command
line. If you are using Visual C++, don't follow the obvious documentation and go messing about
with exported make files and nmake. You can just call the IDE in command line mode.
msdev "engine.dsp" /MAKE "engine - Win32 Release" /OUT "build.log" || error msdev
Before you call it you need to set up some environment variables - mainly PATH, LIB and
INCLUDE. You can find these in the file VCVARS32.BAT which is in the Visual C++ bin directory.
Under Windows NT/2000 the Visual C++ installer can set them up for you in the global
environment.
When the build has completed the script can copy the file to the network, with an appropriate
name to mark its version number.
So that everyone knew what has changed in the new executable, Creatures 3 had a ChangeLog
file in version control. Whenever someone made a change, they added a comment to this file
saying what they have done.
The engine build process checked this file out, and posted its contents to an internal newsgroup
announcing the new engine. It then cleared out the file for future use, and checked it back in.
At last, agent engineers and level designers know exactly what is new!
File Copying
To transfer assets from and to the network you need the full suite of file copying commands.
They can seem quite slow under Cygwin, but they are fairly intuitive to use.
mv engineSDL.exe engine.exe
mv -f *.gen old/
mv My\ Agents/* "Resource Files Directory/"
cp -R "CD Image" "//oracle/c3/Completed Builds/new build being copied"
mkdir Source\ Data/Catalogue
rm -fR Source\ Data/Bootstrap
cd $BUILD_DIR
Notice the backslash to escape spaces, as in My\ Agents. Alternatively, you can use quotes to
similar effect, as in "Resource Files Directory/". If you do that you need to put any wildcard *
outside the quotes, or else it will be protected as well.
The -R stands for recursive, and confirms that you want to copy or delete a whole tree. The -f
in the rm (remove) command is used to suppress errors if the file isn't there. This is useful in a
build script. You can use network paths (UNC) directly, as in "//oracle/c3/".
Text Processing
The Unix command line is excellent at manipulating text files. They are its bread and butter.
You can use text processing to update version numbers, to fiddle with InstallShield scripts and
to check for errors in your game script files. It's a bit of a black art, but well worth learning.
Quite a lot can be done within Bash. Have a read of the manual, particularly the section on
"Parameter expansion". It can be more fun, and clearer, to use standard Unix command line
tools.
Grep (or egrep, which is the same with a slightly more standard syntax) can be used to search
for lines within a file in a powerful way. The following example finds all the lines in a file which
contain the text "sndc" or "sndl" or "sndq" or "snde".
Those are all sound commands in our scripting language. The egrep line is part of a function to
verify that all sound files are present in the build. All the funny characters form regular
expressions. This is a powerful way of pattern matching. It can do whatever you want, when
you work out how to do it. Type "man grep" to find out the basics.
Notice that there is some more escaping with \ to make sure the | (or) characters get through
to grep, rather than being treated as a pipe by the shell.
This example replaces the text "My Creatures" with the Dutch "Mijn Creatures" in an
InstallShield script file.
Using Perl has the advantage that it can edit a file in place. Sed (short for Stream EDitor) is
similar to this restricted use of Perl, but can only be used with redirections. You have to make a
new file. It's useful if you're in the middle of a long pipe of other commands though.
That slightly crazy command comments out debugging lines in our internal scripting language.
Any lines which begin "dbg: outs" or "dbg: outv" get replaced with "* dbg: outs" and "* dbg:
outv". The asterisk is the comment character. Notice the complex regular expressions - they
make more sense when you write them than when you read them.
These are very powerful tools that can be strung together to do lots of things. Since learning
them I've find them very useful for analysing log files and writing one-off scripts for many
purposes.
By this point your script should have gathered everything from source control and the network,
carried out any processing, and have a basic game "footprint" image for testing out.
A good test to do is to copy the footprint into the Program Files directory and launch the game.
You probably want to doctor the footprint a bit to put the game in a rolling demo or self test
mode. In Creatures 3 we added script files to automatically load a world and hatch some Norns.
You can use Bash's job control to kill the game after 5 or 10 minutes when the test has
completed.
echo Launching
Creatures 3...
cd //c/program\ files/Creatures\ 3/
# Note the ampersand on the next line, which launches
# the game engine in the background.
./engine.exe&
ENGINE_PID=$!
echo -n Waiting 5 minutes to see if any errors are made...
for TIME in 5 4 3 2 1; do
echo -n $TIME..
sleep 1m
if [ -f "$ERROR_FILE" ]; then
error Full game test failed
fi
done
echo ..done
# This uses the stored engine Process ID to terminate it
kill -9 $ENGINE_P
The Creatures 3 game engine always writes errors to a log file. This is to make it easier for
testers and customers to report bugs or problems. It also makes it easy for the script to detect
any problems. With the ecology in the game running, and the Norns playing about for ten
minutes, we could be sure that the build worked. This guaranteed that our testers got
something reasonably good to play with -- their time not wasted on overt errors.
At this point, Creatures 3 also used the game engine to generate a special asset for the build.
The game featured a pre-generated Norn "starter family". They had to be constantly kept up to
date with changes in genome and engine archive file format. As ever, the scripts took the
strain.
The two parent Norns were automatically created during the self test. They were aged, taught
all vocabulary, and allowed to walk round and play for a while. They were also encouraged to be
friendly with each other so they would breed more easily on the players' machines. After this
quick introduction to life, they were exported. The build script then copied them from the
Program Files directory back to the build image.
The script does some extra checks at this stage. For example, it ensures that the female starter
Norn doesn't actually get pregnant!
Numbering each build is a good idea. You can keep a file on the network with the last build
number, and then increment it each build.
The number can be displayed in game, and used to generate the network directory name where
you copy the finished build.
Creatures 3 used InstallShield for installation. This section of the build script doctors some of
the configuration files, compiles the InstallShield scripts, and builds an installation. It then
copies that and makes a CD Image on the build machine.
For the Internet-based sequel to Creatures 3, called Docking Station, we used our own custom
installer. The build ran our own special compression scripts at this stage. It then uploaded the
files to the internal web server for testing. With one command we can make a new build live to
the general public.
It's bit fiddly calling InstallShield from the command line. You need to both Compile the scripts,
and Isbuild the final image. All our file groups are set in dynamic mode in the IDE, so any new
files are automatically pulled in.
echo Compiling...
echo "Building..."
rm -Rf "Temp"
isbuild -p"$BUILD_PATH_WINDOWS\InstallShield" -m"Default" -b"$BUILD_PATH_WINDOWS\Temp" || error
isbuild
Notice the lines that move InstallShield's completed files from /Disk1/ to the appropriate place
in our CD Image. This is a great saving - before build scripts it was a real pain to have to
manually move the files.
InstallShield's compression features aren't used, so the files are uncompressed on the CD. This
makes it easy for people to check what was in a particular build, and also for users to retrieve
corrupt files without a complete reinstallation. Customer support loves it too -- they can
recommend a manual installation if a customer is having unforseen trouble with InstallShield.
Unfortunately you can't quickly patch a build by changing these uncompressed files. If the size
of any file changes then you have to recompile from within InstallShield.
Network Copying
With the CD Image is complete, it can be manually tested on the build machine or installed
directly from there to other machines.
The final stage is to copy the CD Image to the network. If the network was busy we found it
could take twenty minutes to finish the copy. People often tried to install the image while it was
incomplete.
To get round this problem of over-eager testers, Creatures 3 copies the build to a folder called
"new build still being copied". When it has finished it renames it to the appropriate name,
number and language. For example, "build 98 spanish".
The builds were all put in a standard "completed builds" directory on the network. This makes it
very easy for people to find them.
In addition to copying the CD Image, it might be worth taking a back up of source code and
scripts here as well. The completed build folder would then be a toolkit for remaking exactly the
same build. Very useful for remasters, where someone wants the identical build but with minor
modifications.
People like to know what is happening, and to feel in control. When the Creatures 3 build script
finished its job it sent an email to a special mailing list on our mail server. This announced the
new build, with a link to its location on the network.
To do this you will need a command line email program. There isn't one built into Windows, but
you can find a list of some on WinFiles.com.
If there was an error in the build then a complete log of the script's output was sent to a
developer on the team. He would then chase up the problem, fix it, and set a new build going.
Error Checking
It would be hard to overemphasise how important error handling is. The build needs to abort as
soon as there is a problem. The default behaviour of Bash is to carry on blindly if you don't put
in any error checking.
Every command line program returns an error code. This is 0 for success, or a number to
indicate different types of failure. For example, diff returns 0 if no differences were found, 1 if
there were differences, and 2 for an error (such as file not found).
You can automatically make the script stop if any simple command returns an error by using
set.
set -e
If you're using pipes a lot you need to be slightly careful with this, as an error in the pipe isn't
detected. The variable $? will be set with the return code of the last command in the pipe.
Logging error messages for emailing to a developer is a little tricky. The easiest way is probably
to call your whole script from another script.
The outer script can pipe all output (stdout and stderr) to both the screen and a file using "tee".
When it has finished it can check the error level. If it is greater than zero then the output log
can be emailed to the appropriate person.
This example outer script from Creatures 3 takes a list of languages as parameters. For each
one a function "go_build" is called with the language as a parameter. The function then runs the
main build script "c3build" with the appropriate build stages, including the language.
function build_error()
{
# ...
# code to email /tmp/build.err to developer
# ...
function go_build()
{
rm -f /tmp/build.err
Whenever there is an error, the contents of the log file are immediately emailed to the
developer.
Potential Problems
There were some problems or frustrations with the Creatures 3 build process and the tools
used.
The Cygnus tools expect a Unix format file system. This has everything mounted under one root
node /, rather than having drives c:, d:. All of the tools can happily access files using paths like
"//c/program files/creatures 3/engine.exe", or network paths "//cyberbackup/clabs
xtra/creatures 3/". The problem comes when calling external Windows programs with absolute
paths as parameters. You then need to store the path name twice.
BUILD_PATH_CYGNUS="//d/C3Build"
BUILD_PATH_WINDOWS="d:/C3Build"
The first form is to pass to a Unix tool, and the second to pass to a Windows designed program.
Because of spaces in filenames, the syntax of cp and mv and other commands can be
counterintuitive. This took up more time than expected. One solution is to learn this aspect of
Bash syntax carefully. On some later projects we took the simpler solution of disallowing spaces
in all filenames.
Like all good tools, it does take some time to write build scripts and maintain them. It's time
well spent, but it needs to be budgeted.
Future Ideas
There are lots of ways in which build scripts could be developed and improved in the future. I'm
sure you'll invent lots of things as you integrate automation into your development process.
Here are some things we learnt from the Creatures 3 build scripts, and some general ideas to
get you thinking:
Optimising network copying. It's easiest when writing the script not to worry about network
traffic, but copying large assets about the place is very slow and clogs up your network.
Perhaps ideally the build script should run on the project file and source control server.
Alternatively, you can just pay attention when writing the script, and not needlessly copy large
movies off the file server and then back onto it again.
What's the best way to segment the build into sections? The Creatures 3 scripts were
broken into eight sections: engine, prepare, language, check, test, compile, commit, wolf.
These sections can be tested and run independently. It would be useful to further break things
down, as often there are independent subsections within each section. The more modular your
script is, the more useful you will find running parts of it by themselves.
It would be useful if it could automatically restart in the middle. The Creatures 3 build process
doesn't have any sense of where it is up to. Sometimes you can manually restart from halfway
through, but you have to know what you are doing. It would be useful if the script tracked this.
That way, when an error is found, the build can be resumed as quickly and reliably as possible,
without repeating complete stages.
Web interface available to the whole team. It might be useful to have a user interface
displaying progress of the current build, and allowing anyone with authority to start a new
build. This would make the build process a tool owned by the whole team, rather than
something one individual has to control. This empowers people more, and lets the team make a
build whenever they need to. It might not be worth doing, as it is often helpful to have one
person co-ordinating things.
Automatic error reporting. It would be possible to categorise errors, and report them
automatically to the appropriate person, by email. This works well with the web interface, and
has similar advantages and disadvantages.
Ability to run multiple builds at once, or easily sequence builds. This would make it
easier to prepare different localised versions.
Modular and general purpose. It might be possible to make your build scripts more general
purpose, so they can be used on several projects. It's not clear that this is practical, as every
project will be quite different in its requirements. Scavenging code from previous projects is
often easier. One thing to do would be to separate key bits into separate scripts - for example,
"compiling with InstallShield", and "getting a project from source control and compiling it with
msdev". This library of scripts could then be shared between projects.
Conclusion
Using an automated build process was an invaluable tool during the development of Creatures
3. We cut costs and increased quality. Partly because of this, Creatures 3 was completed both
on budget and on time.
You can easily integrate a build system with your existing development environment. You will
learn new skills, and enjoy life more. Let no developer repeat tedious tasks by hand on a
computer ever again.
Getting and Using the Tools
Since RedHat took over Cygnus, it has been quite hard to ferret a version of the Cygwin tools
from their web site. Things look easier now, just go to http://sources.redhat.com/cygwin/ and
follow the instructions.
Under Windows, the easiest way to install Perl is using ActiveState's distribution. Go to
http://www.activestate.com/. This also installs a useful version of the Perl documentation which
you can access from the Start menu.
The definitive sources of documentation are the somewhat dry online manuals. You can read
some of them with the man command, or find them online.
Part One of this article investigated the use of sphere mapping to simulate curved-surface
reflections. Part Two looks at how refractive texture mapping can be implemented to simulate
refractions.
The steps to implement refractive texture mapping are similar to those used in Part One, but
instead of figuring out the reflected ray, the refracted ray is computed. These rays are then
used to "hit" the texture generating the UV coordinates. The texture doesn't need to be
spherical; a simple planar map produces the best results. Finally, the polygons are sent to the
hardware to be transformed and rendered. As usual, several details need to be considered for
implementation.
The refracted ray (Figure 10) is obtained from the Snell's law that states:
I need to share an observation here, because I spent couple of weeks thinking about the
problem of the refracted ray. When I realized that I had to use angles to compute the refracted
ray, two things entered my mind: first, that it was going to be slow, and second, that the math
was going to get complicated. It turned out that computing the refracted ray in 3D space was
no exception. As is usual for me, every time I see angles in computer graphics algorithms I
don't like it. Among other problems, angles are ambiguous, and normally difficult to work
because the algebraic manipulations require special attention with quadrants, conventions, and
so on. In spite of my frustration, I finally had an idea: Could I approximate the refracted ray by
using another type of calculation rather than Snell's law? Luckily, the answer was yes. There
might be several different ways, but here I'll present the one that I came up with that gave me
satisfactory results.
It is possible to approximate the refracted ray by adding the incident ray with the inverted
normal and multiplying a factor to the normal. In this article it's called "refraction factor." This
refraction factor has nothing to do with the index of refraction (Snell's Law). Indeed the
refraction factor is a made up term. What the refraction factor does is to weight the normal's
contribution to the final expression. For instance, if the refraction factor is zero, the refracted
ray is equal to the incident ray. On the other hand if the refraction factor is a big value, the
incident ray's contribution will be so small that the final refracted ray is the inverted normal. By
looking at Figure 11 the final expression is:
Figure 11. Approximating the refracted ray
By using Equation 8 to compute the refracted ray, we get rid of angular formulas. Also, because
of its nice format, the implementation is straightforward. Figure 12 shows the refracted ray
computed by Equation 8 displayed in debug mode.
Once the refracted ray has been computed, two more steps are required to get the UV
coordinates for rendering. First, the ray needs to be extended to see where it hits the texture
map. Second, linear interpolation is used to transform the intersection point to UV coordinates.
Note that one more variable is necessary, the depth -- that is, how deep the ray needs to travel
until it hits the map. This can be easily done by using the parametric line equation in 3D space.
By looking at the line equation we have:
x= x0 + Rfx*t (Equation 9)
y= y0 + Rfy*t
z= z0 + Rfz*t, where:
In Equation 9, x0, z0, and y0 are the coordinates of the vertex being processed. Now the
program needs to figure out where would be the new XZ coordinates. This can be done by
computing the t parameter using the y part of the line equation and the y component from the
refracted ray, so that:
y - y0 = Rfy*t,
t = (y - y0) / Rfy (Equation 10)
What is y - y0 exactly? Here is where the variable depth comes into scene. The term y - y0 tells
how far down the ray travels, so y - y0 can be replaced by the depth. As y0 is moving up and
down throughout the mesh because of the perturbations, the final depth needs to be computed
correctly to make sure all the rays stop at the same depth. This is done by adding the initial
depth (given variable) plus the displacement between the mesh's initial y and the current
vertex. Equation 11 shows how the final t parameter is computed, and Figure 13 a side view of
the mesh.
p = pi + (yi - y0)
t = p / Rfy, where: (Equation 11)
p = final depth
pi = initial depth
yi = initial y mesh (mesh without any perturbations)
y0 = current vertex's y component
Once the t parameter has been computed, the z and x part of the line equation is used to
compute the ZX intersection. Then, simple linear interpolation is used to transform the new ZX
values (intersected point) into UV coordinates. The x-max, x-min, z-max and z-min from the
mesh are used for the interpolation. Equation 2 [link to first part of article] is used, but this
time the x-min, x-intersected and x-max values are used to compute the U coordinate. Equally,
the z-min, z-intersected z-max values are used to compute the V coordinate. The final
equations are:
u, v = texture coordinates
xi = mesh's x minimum value
xe = mesh's x maximum value
zi = mesh's z minimum value
ze = mesh's z maximum value
By getting the UV coordinates with these procedures, a problem arises. Depending on the depth
and the vertices' positions (especially the ones close to the mesh borders), the refracted ray
may not hit the map. What that means is that the intersection region won't be valid, therefore
generating or negative UV coordinates, or UV coordinates greater than 1.0. With some
hardware, these UV coordinates will make the textures warp, creating undesirable artifacts.
A few things can be done to correct this problem. The simplest way is just to clamp the UV
values if they are invalid. Some "ifs" can be added to the inner loop checking for invalid UV
coordinates, for example, if they are negative they are clamped to 0 and if they go over 1 they
are clamped to 1. This procedure can still generate undesirable artifacts, but depending on the
camera position and the texture, it may not be too noticeable. Another way is to figure out what
would be the maximum and minimum X and Z values the refracted ray can generate at the
corners of the mesh (with the maximum angle). Then you can change the interpolation limits
taking into account this fact. Now, instead of interpolating the UV from 0 to 1, let's say for
example they would go from 0.1 to 0.9. By doing this, even when the X and Z values
extrapolate the limits, the UV coordinates are still in range. My sample program uses the first
approach, and Listing 4 shows the implementation of refractive texture mapping.
long t0;
POINT3D camera_ray;
POINT3D vertex_normal;
VERTEX_TEXTURE_LIGHT *vertex_current;
POINT3D refracted_ray;
float interpolation_factor_x;
float interpolation_factor_z;
float final_depth;
float ze;
float xi;
float t;
float map_x;
float map_z;
float new_u;
float new_v;
static TRANSFORMD3D_CAMERA camera;
DEBUG_FATAL_ERROR (!water);
//update camera
camera = TransformD3DCamera;
vertex_normal.x = vertex_current->nx;
vertex_normal.y = vertex_current->ny;
vertex_normal.z = vertex_current->nz;
math2_normalize_vector (&refracted_ray);
//interpolate
new_u = (map_x-xi)*interpolation_factor_x;
//clamp if overflow
if (new_u < water->refraction_uv_min) new_u = water->refraction_uv_min;
if (new_u > water->refraction_uv_max) new_u = water->refraction_uv_max;
//assign
vertex_current->u = new_u;
vertex_current->v = new_v;
Optimizations
Some optimizations can be done to the techniques described in this article to speed up the
calculations. Nevertheless, I have to admit that water simulation done by these techniques can
be intense if the mesh is composed of a lot of polygons, especially because the vertex data
changes every frame. Your best bet is not to use too many polygons, but I'll go over some of
the bottlenecks here.
In the mapping techniques, most of the calculations rely on vector normalization. I did not
address this problem too much because most console platforms perform vector normalization in
hardware, and on PCs the new Pentiums can handle square roots reasonably well.
Perhaps where a lot of experimentation can be done is in the perturbations. As I mentioned
earlier, instead of use the ripple equation, you can try to replace this part by the array
algorithm (see References) or a spring model. However; it's important to know that these
algorithms do not use sine waves and therefore the ripples may lose their round shape (in side
view). I haven't tried this yet, but it will definitely speed up the code. If you plan to use the
ripple equation anyway, another thing that can be done is to create a big square root lookup
table. That is, if the ripples are always generated certain positions, the radius
can be precomputed and stored in an array. Even though the values under the square root are
floats, the mesh is composed of discrete points. Using this fact, you can use the vertices'
indices to index the square root array.
Another big bottleneck in the code is the normal calculation. Here more than anywhere else I
welcome reader suggestions, because I could not find a very efficient way of computing the
vertex normals for a mesh that's constantly being morphed. They way I've implemented it is to
create lookup tables of neighboring triangles. When I loop through the mesh vertices I can
quickly look up its neighboring faces and average these face normals. However, this is still slow,
and even console hardware won't be able to help me much, except for the vector normalization.
Sample Program
When you look at my implementation, you'll notice that the code is slightly different from the
code presented in this article. There is a lot of debugging code, editor interface variables, as
well as some experiments I was doing. In this article, I excluded this part for sake of clarity.
Also, you won't be able to download the whole application source files to compile in your
computer. I just put the water part for you to look at as a reference. The main reason for this is
because the water implementation is part of an editor I've been working on for the past two
years or so. The whole editor has thousands of lines of code, and it performs several different
algorithms, experiments, and so on. So, I decided to not make the entire editor public. If you
run the editor, you can control several parameters of the mesh, mapping techniques, and
perturbations. You can play around with those parameters to see how they affect the final
simulation. Also, you can fly with the camera by using arrows, A, Z, Q, W, and Shift and Control
plus these keys will make the camera move slower or strafe.
Final Words
I hope this information has helped you to understand and implement some interesting
techniques for simple water simulation using refractive texture mapping. Please e-mail me at
[email protected] or [email protected] with suggestions, questions, or
comments.
To download the code and run the sample application, go to my web site at
http://www.guitarv.com/, and you should find everything you need under "Downloads."
References
Möller, Tomas, and Eric Haines. Real-Time Rendering. A.K. Peteres, 1999. pp 131-133.
Serway, R. Physics for Scientists and Engineers with Modern Phyisics, 4th ed. HBJ, 1996. pp.
1023-1036.
Ts'o, Pauline, and Brian Barsky. "Modeling and Rendering Waves: Wave-Tracing Using Beta-
Splines and Reflective and Refractive Texture Mapping." SIGGRAPH 1987. pp. 191-214.
Watt, Alan, and Fabio Policarpo. The Computer Image. Addison-Wesley, 1997. pp. 447-448.
Watt, Alan, and Mark Watt. Advanced Animation and Rendering Techniques: Theory and
Practice. Addison-Wesley, 1992. pp. 189-190
Harris, John W., and Horst Stocker. Handbook of Mathematics and Computational Science.
Springer Verlag, 1998.
PlayStation Technical Reference Release 2.2, August 1998 (Sony Computer Entertainment)
Lander, Jeff. "A Clean Start: Washing Away the New Millennium." Game Developer (December
1999) pp. 23-28.
Haberman, Richard. Elementary Applied Partial Differential Equations, 3rd ed. Prentice Hall,
1997. pp. 130-150.
Graff, Karl F. Wave Motion In Elastic Solids. Dover Publications, 1991. pp. 213-272.
You've developed an object viewer, a Quake-level viewer, a radiosity renderer and a patch
tessellator. You know 3D math, OpenGL and the 3DS file format by heart. Now you want to go
further -- you want to use this knowledge to develop an entire engine.
Does this seem like a large task for you? Well let me tell you, it is. Anyone who has developed a
complete game will agree with me. Many people give up or start over along the way. But there
are a number of things you can do that will increase your chances of reaching your goal.
I will not try to tell you how to make a good, playable game. Nor will I tell you how to program
any specific parts of an engine. In fact there won't even be a single line of code in this article.
Rather, I want to encourage you to think beyond shadow volumes and dotproduct3. Think
about what it will all be used for, and work with this goal in mind.
Even though this article talks about the development of a 3D engine, the same techniques
should be applicable to, for example, 2D or playing-card game engines. The main goal is to
enable you to quickly and (relatively) painlessly develop an engine and a game that uses it.
What to Consider
There are three steps that I undertake before beginning work on an engine:
1. Set the goals for the engine. What will it actually be used for? Who is going to do all the
hard work? And does it seem feasible at all?
2. Define the feature list. When you know the goals for the engine, and the possibilities of
the team, you can define what the engine must be able to do.
3. Define the overall structure and the subsystems required. Knowing what an engine is
required to do, you define how it should do it. This is how you determine which
subsystems to create, how they should interact, and the order in which to program them.
These three steps might require a few days at the beginning of the project (when you're eager
to get busy with your compiler), but I guarantee you that the time will be well spent. It is
entirely possible to develop a game without any planning at all, but the further you get in
development, the more you will wish you had devoted some time to planning in the beginning.
There are a number of reasons why you should clearly define your goals before you begin.
The specs of the engine should be fitted to the team making it, and to the game for which it is
being made. Programmers often aim too high, and start building features that are not
completely necessary. Or they may not have the time, the manpower, or the skills to finish off a
highly polished game.
Setting the goals help you clarify your focus, and give you a direction to go. With a clear
purpose, you will likely keep on the right track through the end of the project. Remember that
one small completed project, even with only a few levels, is better than ten great projects
abandoned when they are 90% finished.
The third reason for establishing clear goals is that you will remember that developing the
engine is not an end in itself; it is just a tool to make the game. Gameplay often takes a back
seat in game development, both in marketing and among developers, but remember that a
neat engine never made a good game. Gamers have known this for years, and we developers
should keep this in mind.
Note that the goals are quite broad, and should help everyone to get a quick overview of the
project. More specific details are considered later on in the process.
The Team
The goals should be suitable and feasible. Therefore they should be based on the team, the
deadlines, and the project. A number of useful questions to ask yourself are:
If you work alone it is easier to keep an overview, but if there are more people, it makes it
easier to keep up the spirit. Knowing that more people will join as the project goes along can
also help to push you over the worst hurdles.
If you are a single programmer, and has a fulltime job besides programming, writing the next
Unreal entirely from scratch does not sound that feasible.
If you are a single programmer, writing the next Unreal entirely from
scratch may not be entirely feasible.
What are the skills and the experiences of the team members?
If you have few artists and many programmers, writing a game with procedural graphics would
make more sense than trying to make a new Final Fantasy. And if none of the team members
have ever written any large programs, it might be better to aim for a small project.
The key thing is that you have to be realistic or, even better, pessimistic about your team's
skills and available time. It is definitely better to make a simple game quickly than trying to do
an epic tale with too few or too inexperienced people. Even highly professional development
houses have sometimes managed to screw up here, causing years of delays and ending up
releasing boring, uninteresting games.
Deadlines
The deadlines include the final version and the feature-complete alpha as well as demos or
possible milestone requirements. Even if you don't work for a company, you should still set
deadlines. With no clear deadlines it is easy to keep working on nitty-gritty details that have
only a marginal impact on the project, but if you know when to be finished with each part, you
can always come back to do fine polishing later.
Oh, by the way - learn to keep the deadlines! That would make a lot of people happy - including
yourself.
The Project
If you do a project just to learn, there is no point in making 100 levels, when it is just the same
tasks you will do over and over. And if you have an innovative gameplay idea that you want to
try out, you should not have to worry about writing menu systems or cinematic routines.
A simple way to avoid problems when working across mulitiple platforms is to design for the
smallest target and simply scale up and add more detail on the more powerful machines.
However, different machines have different weaknesses, and it is important to know these
(some machines, for example, excel at polygon-pushing power, others have more memory,
etc.).
What style of game will the engine be used for? How complex is the game? Is it very
innovative, or a clone of an existing game?
Do you want to allow the mod community to add new stuff to it? In that case, you are limited
from using any commercial package like 3D Studio Max, and you better have easy-to-use, bug-
tested editors to go with the game. It is important to remember that the mod community
consists of gameplayers, not professional game designers (yet.) Also try to make sure that the
file formats are simple and documented, in case people want to write their own tools. And if you
have a script system that the mod community has access to, make sure that it is very stable
and works properly, even if it gets weird user input.
The more levels, and the more complex they are, the more important it is to have a good set of
tools to use. A 30-level Quake game will require more advanced tools than an outdoor game
with three levels based on procedural geometry.
Still, do not underestimate the amount of time that is wasted due to inadequate tools. In my
first game, Banshee (a 1942 clone), we used a terrible in-house level-editor on the Commodore
Amiga for the enemies. However, since it was not possible to set any properties, the levels had
to be converted to source code, copied to the PC, add data by hand within the source code, and
then compile and test it. We estimated that we would probably have saved a month or two (out
of 11 months in total) if I had written my own level editor.
Should the engine be used for only one game, multiple games in the same genre, or completely
different types of games?
If you want to target your engine for use in multiple games, there should be as little gameplay-
specific code in the engine as possible. If it will be used for just a single game, it might be
possible to do some hard-wired functions. As an example, in Banshee the player-characters
used hard-coded functions, while my script language controlled the enemies. Still, I cannot
recommend this practice, as it gets too messy.
An Example
What I've explained thus far should help you determine where your project is going. The next
phase is to share this vision with all of the people in the project.
As an example, let's imagine a team consisting of four people: an artist/3D modeler, a level
designer, an engine programmer, and a gameplay/editor programmer. They have limited
experience, and they are all working in their spare time. Their goal is to make a simple game
that can give them a job at a game company, and to release the game as freeware. Therefore
there are no requirements for demo versions or milestones (except internally). The target
machine is a 450 MHz PC with a TNT2 graphics card.
One realistic goal for our little team would be to create a simple single-player Terminal Velocity-
style shoot-'em-up, with about 3-5 levels. The engine should not be developed with reuse in
mind, and the levels should not be editable by players.
Feature List
Now the goal is defined, and the features required by the engine can be determined. Again, it is
very important to keep the skills of the team and the time constraints of the project in mind.
It is important to distinguish between what you need and what you would like. So I suggest
making two lists: a feature list and a wish list. Even though it is often more fun to work on
items in the wish list -- usually the eye candy -- wish list items should wait until the required
features are implemented, or when you're waiting for other people to finish their parts.
Things that fundamentally change how the game is made, such as implementing skeletal
animation or developing a scene editor, should not be on the wish list. These things change how
the animators create game content, and their previous work will be wasted. However, a lot of
the eye and ear candy, such as lens flares, 3D sound, rigid-body dynamics, and advanced
particle systems can go into the wish list, as the game can still work fine without these items.
Eye candy like the lens flare in this shot
goes into the wish list, because the game
will work fine without it.
Network play is often put on the wish list. However, implementation of network playing will
influence a large amount of the code, and greatly change the direction of the game. This might
introduce a ton of new bugs and might require reworking the levels. In my opinion it's a bad
idea to leave this question open for later discussion. Either networking is built-in from the start
and all structures and routines are prepared for its implementation, or it is not there at all.
Content Path
It is a good exercise to consider how the game content is created and gets into the game.
Preferably the artists, modelers, level designers, and gameplay programmers should have as
much freedom to create the game with as little work for you as possible.
A typical dilemma is whether it is worth it to make a level editor, or if a tool such as 3D Studio
Max should be used. It is much faster to write an exporter or a converter than to write a full
editor, but in the editor it can be possible to see the level as it will end up in the game, and it
will be easier to edit object properties, apply scripts and create new tests. Whether to build or
buy again depends on the project, the team and the deadline at hand.
It's common for developers to have to wait for compilation, conversion or rendering tasks, and
frequently developers have to perform tedious tasks which can introduce errors if not performed
correctly. If possible, try to eliminate these tasks, or at least optimize them. For example, if
your engine requires BSP-tree compilation, it should have an option to run an un-compiled
version as well.
Another problem I have encountered is when artists depend on programmers to test their
content in the game. This is annoying for the both artists and programmers, so try to design a
system so that artists can test content in the game themselves as soon as possible. During the
development of my first 3D game, Amok, the productivity increased by a factor of 3 to 4 when
we got an extra development kit for testing the graphics and the levels.
Let's return to our imaginary development team. Our little team has decided what the engine
should do, and it has kept things simple to keep the deadline short and the goal easily
reachable.
Their engine will require a landscape with subdivision. The landscape is based on a height
image from a paint package, and the engine will select the textures procedurally. It will also
contain 3D object drawings, with the objects modeled in Milkshape. All objects and textures are
loaded into memory at the beginning, and no texture swapping will be needed.
The elements should be able to move and rotate in full 3D (six degrees of freedom). Collision
will be sphere-sphere for checks against other elements, and sphere-polygon for checks against
the landscape.
The game elements will be placed on the map in a simple top-down 2D editor. In this editor it
will also be possible to set properties on each type of element, such as hit-points, sound effect
and particle effects. The engine will be able to be started from the editor, to speed up testing of
gameplay.
All gamecode will be written in C++, and the code for each element will be derived from a
virtual class that is used to get information from the editor. The main debug functions will be a
log-file and an overlaying text window for run-time information such as framerates and status
flags.
The game will contain a particle system for explosions and other effects, and it will use MP3 for
music and .WAV files played in simple stereo for sound effects. It will use a joystick for input.
On the wish list are lens coronas, multiple sky layers, shadows on the landscape from the
elements and a moving sun.
By now you know what your final goal is, and what your engine should do in order to reach this
goal. Now it is time to get into the technical details, and define which subsystems and classes
are needed, how they should work, and in what order they need to be written. Not everything
needs to be described in full detail, but the more integral a subsystem is to the engine, the
more important it is to lock down its functionality, or at least the interface.
I see three good reasons to spend some time defining the system in details before beginning:
First, by scheduling the different tasks properly, it is possible to avoid people having to wait for
each other to finish their tasks before being able to complete their own tasks. Bottlenecks can
be discovered faster and circumvented. And it is possible to see if the deadlines are achievable.
Second, a good schedule covering all tasks will also help the team to focus on what is actually
important -the game - and not be fixated on working on the engine.
Third, there is a psychological factor: It seems much more manageable if there are a lot of
small tasks rather than a few large ones, and it is easier to see progress occur.
How?
What I prefer to do is write down in detail which subsystems will be required, what their
functionality is. Designing a directory structure and defining file types and their content is also
useful.
Drawing charts showing how the different subsystems interact, and how the different classes
and files are related can greatly help in giving an overview.
Recycle or Reinvent
One question to ask yourself is how much you want to use other people's work and how much
you want to write from scratch. A lot of programmers take pride in doing as much as possible
themselves, but again I think that it is more important to see what your goal is. If you are tring
to impress a potential employer, I am sure he would be more impressed by seeing a
programmer that gets the job done by any means possible than a programmer that can do
everything himself but who takes much longer to do it. Sure, you learn a lot by doing
everything yourself, but being able to work with other people's code is also an extremely
valuable skill for today's developers.
For example, doing your own image loader will not help you to write a game faster, or improve
the quality of the game. And unless you have some revolutionary ideas that no one has ever
thought about before, a publicly available script system will perform almost as well as your
hand-written one. True, you might get a little more performance and a little more user-
friendliness by writing your own, but often the time is better spent elsewhere in the project.
Of course there are problems by using other people's libraries. Even if you can get hold of a
preview version, the authors might want money if you use it commercially. This should not be a
problem in our example case, as the team primarily wants to show its skills to potential
employers.
Another problem is that the more the library is tied in with your engine, the more important it is
to make sure that the library fits to your exact requirements, and the more important it is for
you to tailor your engine to fit to the library. An MP3-player has very loose ties, and will easily
fit into any engine, no matter how it works, but a physics library like Math Engine should be
planned for from the beginning of the project. Once I spent days implementing a commercial
library into my engine, just to find out that the way it did one certain thing was completely
incompatible with my system
Debugging might also be a problem. Even though most of these libraries are well documented
and thoroughly debugged, you might have crashes either caused by your own code or the
library. But often the library source code is not released, and debugging gets practically
impossible, and therefore it might be easier to implement your own system.
The question about what to do yourself and what to borrow from other people, like many other
questions, depends on the team, the project and the deadlines. Again there is no clear answer.
The Schedule
Schedule important things to be done first. It sounds logical, but that is not always how it
happens during a project, because different people have different ideas about what is
important. The criterion that I like to use is whether people's productivity will go up when a
subsystem is being worked on. I always have much of the ear- and eye candy at the very
bottom of the list.
A good example is the rendering system. It is very important to get the objects on the screen
quickly, both for the programmers to see if all of the code is working, but also for the artists to
test the graphics and level designs. But on the other hand a feature like texture mapping is not
very important at all. The gameplay programmer, the level designer and the modeller can still
work fine without textures. Sure, it helps, but the productivity gains are marginal.
If it looks like it will take a while before your indoor routines will be ready, consider l
artists start to create the indoor sections in a Quake editor.
Another example is the collision. Sure, collision is necessary, but getting the scripting system
up and running is much more important, in order for the gameplay programmer to start
working on player control, weapons, enemies and levels. This is especially important if it is a
new, innovative type of gameplay, where lots of tweaking is expected.
Your overall goal should be to get as much performance out of other people as possible, and to
spot bottlenecks before they occur. For example, if it looks like it will take a while before your
indoor routines will be ready, consider letting your artists start to create the indoor sections in a
Quake editor, and test in Quake, while you work on the code.
Modularity
If you are not used to writing large programs, you might not know how you should split up the
different subsections, as they naturally all work together in some way or other. But it is a good
exercise to do it, even for simple projects, and it helps you a lot as the project grows.
To give an example from my current engine, I have an input class, which contains joystick,
keyboard and mouse, but which also contains my random generator and frame rate measuring.
My reason for doing this is that this is practically all I need to have a complete record and
playback functionality, and I do not need to access many subsystems to save a demo file.
Of course what you decide to have in each subsystem is up to you. But if you are aiming for
later converting to other platforms or APIs, you should not try to shape the subsystem after the
features or structure of the primary API or platform.
A simple reminder: the less code you write, the less bugs there will be. Before you add a
feature, make sure that it will be used thoroughly. Writing lots of highly specific code usually
means more bugs than a little, more general, code that requires some parameters to be set in
the level editor. And since the general code is run more often, and with different input, it will be
much more thoroughly tested than the specific code.
Once again, this means less work for yourself, but more work (and more freedom) to your
creative people.
My experience is that it is much easier to fix bugs as soon as they occur. If many bugs are left,
you never know if it is a known issue or something new that causes a crash. And never work
around a bug that you can't find and hope that it will disappear over time. This kind of bug
tends to settle down deeply nested in your code, marry and have lots of little baby bugs all over
the place.
Keeping your code more bug-free is also a blessing when it comes to milestones and demos.
The bugs always resurface five minutes before the fat man with the wallet enters the door.
Conclusion
If I had to sum up what I have been trying to say so far in four words, they would be: Think
before you code!
Spend one or two days before you start to write a lot of code, and you will not get stuck as
easily later on. And make sure that you get everything down on paper instead of keeping it in
your head. This will you to keep focused and goal oriented. Always consider everything you do
before you do it.
Be realistic! Know your own and your team's strengths and weaknesses. And do not aim too
high.
Stick to the plan! Don't give up! Focus! Keep on! Be Persistent! Quitting is such an easy
decision to make, but quitters are not winners. When Duke Nukem 3D was completed its
technology was inferior to Quake, but the Duke Nukem crew still managed to make a better
game because they kept working on and on. I know that you learn a lot and find better
methods as you go through a project, but learning to accept slightly inferior code is a very
important skill that will make you much more productive.
Don't just follow John Carmack and Tim Sweeney's technology plans. Learn from others, but
don't copy blindly. They work 80 hours a week, with twice the efficiency of the rest of us, and
have a few extra programmers to assist them. If the rest of us had to follow them we would
never finish anything. Try to do your own things instead - seek out a project that is right for
you.
Søren Hannibal got a Commodore 64 in 1985, at age 10, and started programming a
couple of minutes later. From 1993 and on he worked for Core Design, Scavenger
(RIP) and Shiny Entertainment, and he has a game with his name on it from each
company. After a three-year break to study he is now looking at getting back into the
business. When he's not busy fighting people, walking on burning coals or climbing
mountains, he's debugging and rebugging his 3D engine. He's hoping to get a lot of
feedback on this article at his website and at [email protected].
Everyone who loves Quake 3 is impressed by the high quality graphics, light maps, and
character animations. Although they have done an excellent job in painting the textual details,
most of their characters consist of only several hundred triangles which cannot capture highly
detailed geometry. In recent years, subdivision surfaces have received a lot of attention from
both academics and industry professionals, people in the movie industry even apply subdivision
techniques to create complex characters and produce highly detailed, smooth animation. This
article examines how to convert triangular meshes (which is one of the most popular data
representations) into subdivision surfaces.
The idea of subdivision surfaces was first introduced by Catmull & Clark and Doo & Sabin in
1978. Unlike traditional spline surfaces, subdivision surfaces are defined algorithmically.
Recently there has been a lot of activity in the computer graphics research community and
significant advances have been made in rendering, texture mapping, animation and
compression of subdivision surfaces. They were also used in the production of Geri's Game and
A Bug's Life. Geri's hands, head, jacket, and pants were each modeled using a single
subdivision surface. The faces and the hands of Flick and Hopper were also modeled with
subdivision surfaces. Momentum is building in the computer assisted geometric design (CAGD)
to make subdivision surfaces one of the modeling primitives.
Subdivision surfaces lie somewhere in between polygon meshes and patch surfaces, and offer
some of the best attributes of each. The well defined surface normal allows them to be rendered
smoothly without the faceted look of low polygon count polygonal geometry, and they can
represent smooth surfaces with arbitrary topology (with holes or boundaries) without the
restriction in patches where the number of columns and rows has to be identical before two
patches can be merged. Secondly, subdivision surfaces are constructed easily through recursive
splitting and averaging: splitting involves creating four new faces by removing one old face,
averaging involves taking a weighted average of neighboring vertices for the new vertices.
Because the basic operations are so simple, they are very easy to implement and efficient to
execute. Also because of the recursive nature, subdivision naturally accommodates level-of-
details control through adaptive subdivision. This allows triangle budgets to be spent in regions
where more details are needed by subdividing further.
The simple splitting process usually starts from a coarse control mesh, iterating this process
several times will produce so-called semi-regular meshes. A vertex is regular if it has six
neighbors (in triangle mesh) or four neighbors (in quadrilateral mesh). Vertices which are not
regular are called extraordinary. Meshes that are coming from standard modeling packages or
3D scanning devices usually do no have a regular structure, hence there is a need to convert
these irregular meshes into semi-regular meshes, a process known as remeshing. This article
presents an algorithm that maps the original mesh onto a simple control mesh - base domain,
thus deriving the parameterization for the original mesh. Having this mapping information
(parameterization) allows us to perform the remeshing efficiently by subdividing the base
domain and perturbing the new vertices to approximate the original geometry. From a signal
processing point of view, we can treat it as a geometry resampling process. The beauty of this
algorithm is that it is very simple and easy to implement.
Before explaining the algorithms, lets take a look at input. Input can be any arbitrary
triangulated manifold mesh. By arbitrary we mean the mesh can have holes or boundaries,
triangulated means the surface is approximated/discretized by a list of triangles. Manifold
means it does not contain topological inconsistencies. Topological inconsistencies include more
than two triangles sharing an edge or more than two corners meeting at one vertex. Make sure
you have good, clean geometries.
Overview of Algorithm
The conversion algorithm contains two major steps. The first step is called mesh simplification.
Mesh simplification has been well known to the game developers for creating levesl-of-detail or
catering to different bandwidth and computational requirements. Most of the time, there is
trade-off between model complexity and interactivity - i.e. a higher frame rate requirement
usually is translated into simpler/coarser model. In this algorithm, mesh simplification is used
only as a tool to derive a base domain so that resampling/remeshing can be acomplished.
Developers are free to choose their favorite mesh simplification algorithm, I recommend Michael
Garland's error quadrics simplification algorithm because it is open source and simple to
understand. The second step is called remeshing; the goal is to create a subdivision
connectivity (semi-regular) mesh with geometry sampled from the original mesh. As was
mentioned earlier, subdividing the control mesh does not add any extra information to the
mesh but only increases the complexity of the model. Hence there is a need to use the mapping
information from the first step so that we know how to perturb these vertices to approximate
the original mesh. (Figure 1).
The 2D Case
Before diving into the 3D mapping algorithm, a look at the 2D curve case might be in order.
The notion of subdivision connectivity does not make much sense here, but the conversion
process can be treated as a regular sampling at the dyadic points. Simplification (removing
alternating vertices in each step - as shown in Figure 2) is used to derive a simple line segment
(base domain in red). Mid points are then inserted in the middle of each line segment by
performing a linear interpolation between the two closest points (equivalent to the geometry
resampling process) to complete the second phase (as show in Figure 3).
Insert a midpoint (yellow circle - dyadic point) on the red line segment (in Figure 3) and find
out the two closest points (green circles) that were mapped from the original curve onto the red
line segment. Then compute the ratio of the yellow circle within the two green circles. Based on
the original geometry of the green circles and this ratio, we can linearly interpolate the
coordinates and obtain the resample geometry of this yellow circle on the original curve.
3D Mesh Simplification
To illustrate the algorithm, I'm using Mike Garland's excellent simplification algorithms from .
The software can be downloaded from his site at http://graphics.cs.uiuc.edu/~garland
/software/qslim.html. The nice thing about this simplification is that it is very fast and easy. It
can simplify highly detailed polygonal surface models automatically into faithful approximations
containing much fewer polygons. The core of the algorithm employs an error metric called
Quadric error metric to provide a characterization of local surface shape. This metric is both fast
to evaluate and does not take up too much memory space. The details of the algorithm can be
found at http://graphics.cs.uiuc.edu/~garland/research/thesis.html.
Figure 4. The contraction of edge v1v2 will remove
vertex v1 and it's adjacent triangles (old triangles),
thus creating a hole and retriangulation is performed
(new triangles are formed).
Each simplification step collapses an edge, removing one vertex and deleting the triangles that
surround the vertex and retriangulating the hole. In Figure 4, edge v1v2 is collapsed and v1 is
removed. In order to preserve some form of mapping between adjacent levels, we are going to
compute a 4-tuple (a, b, g, T) that describes which new triangle T that v1 is going to associate
with. The (a, b, g) tuple is the barycentric coordinates of v1 within the triangle T. To perform
this association for each simplification step, flatten the 1-ring (as shown in Figure 5) of v1 onto
a 2D plane. This planar flattening is achieved by computing the za map. It involves scaling the
angle sum subtend at v1 to 2p or p in the boundary case and raise the length of each edge
subtend at v1 to power a.
Code Snippets
RemoveVertex( Vertex V ) {
PlanarFlatten( V );
Valid = TryRetriangulate( V );
if ( Valid ) {
AssignParameter( V );
Reparameterize( V );
DoTriangulate( V );
};
};
The basic skeleton of the vertex removal is very simple. First perform a planar flattening to
flatten the 3D 1-ring neighbors of vertex V (the umbrella) onto a 2D plane according to Figure 5
description, i.e. scaling the edge length and the angles subtended at vertex V. After flattening
the umbrella use a retriangulation routine (e.g. a Delaunay triangulation) to remove the old
triangles and try inserting new triangles. Make sure that the new triangles do not overlap with
the existing triangles in the mesh, otherwise it will create topological inconsistencies and the
mesh will not be a manifold (i.e. no more 2 triangles share an edge). If the new configuration is
valid, proceed to compute the 4-tuple parameter for vertex V. The computation consists of two
steps. The first step is to assign a new 4-tuple parameter for vertex V which involves the
calculation of the barycentric coordinates and the associating triangle as indicated in Figure 6.
The second step is to update the parameter values of those vertices which are currently
associating with the old triangles. The function CalcNewParameters( Vi ) will perform the
corresponding update according to Figure 7.
AssignParameter( Vertex V) {
Tuple = CalcBaryCoord( V );
InsertTuple( V, Tuple );
};
Reparameterize( Vertex V ) {
ForEachFaceAroundVertex( V , Fi ) {
ForEachAssociatedVertex( Fi , Vi ) {
NewTuple = CalcNewParameters( Vi );
UpdateTuple( Vi , NewTuple );
};
};
};
If there are vertices which were associated with the old triangles, we also need to update their
parameters due to the retriangulation (old triangles will be destroyed). The update can be
computed in way similar to the previous step.
At the end of the simplification there will be a simple control mesh, called a base domain. For
each vertex that was removed in the simplification hierarchy, it will have the 4-tuple indicating
which base domain triangle it is associating with and its barycentric coordinates. This complete
the first phase of the algorithm.
The following images show the results of performing the first phase of the algorithm on a 3-
holes torus. Although this model is a bit simple, it does show the general ability of the algorithm
to handle a genus-3 object (containing 3 handles).
The first image shows the original triangular mesh, while the second image shows the base
domain arrived at from the simplification routine. The third image demonstrates the
visualization of mapping each vertex onto the base domain - thus shrink-wrapping the original
mesh onto the base domain with each vertex having a 4-tuple association. The fourth image
shows the result of subdividing the base domain and perturbing the new vertices to closely
approximate the original mesh.
Geometry resampling
At this point, the mapping computation is finished. One way to look at the mapping is consider
that each vertex has a 4-tuple parameter which tells which base domain triangle it is
associating with and it's location (given by the barycentric coordinates) within this base domain
triangle. Another way to visualize the mapping is to imagine collapsing/wrapping the original
mesh triangulation (treated as a graph) on top of the base domain.
To create a mesh with subdivision connectivity, simply subdivide the control mesh a few times
by splitting an old triangle into four new triangles. The new edge vertices are called dyadic
points. Notice that this 1-4 split produces vertices with 6 neighbors. All of the new vertices
introduced are regular. The most common subdivision schemes are Loop subdivision and
Catmull-Clark subdivision. Loop subdivision is a scheme for triangular meshes while Catmull-
Clark subdivision is for quadrilateral meshes. This article demonstrates the Loop subdivision
method, as it's a natural choice for triangle meshes. The limit surface of the subdivision will be
a smooth surface with C2 continuity everywhere except at the extraordinary vertices. The next
step is to perturb the vertices in such a way that the subdivision mesh can approximate the
original mesh. This is what is called the geometry resampling.
Before examining how to move these subdivision vertices to approximate our original mesh
geometry, there is a need to fix on some notations and introduce the edge terminology so that
the algorithims can be explained. Basically, an edge in the mesh is represented as a directional
edge containing pointers to its origin vertex (e.Org), destination vertex (e.Dest), previous edge
from its destination vertex (e.Dprev) and next edge from its origin vertex (e.Onext), as show in
Figure 8.
Once we define the edge algebra we can proceed to the next step: perturbing the vertex
coordinates.
Code Snippets
To compute the vertex coordinates for the subdivision vertices, first find the triangle in the
flatten mesh on the base domain which contains the dyadic points (similar to the 2D case in
Figure 3 where we locate the two green circles). The triangle location problem is reduced to a
point location problem. M denotes the flattened original mesh on the base domain, and V is the
red dyadic vertex. The white triangle containing V can be located using the routine in listing 1.
Once the triangle in the collapsed original mesh containing the dyadic points is found a linear
interpolation can be used to find the resample location:
Where P is the dyadic point, Triangle ABC is the collapsed original triangle with original
geometry. (a, b, g) is the barycentric coordinates of P within the triangle ABC in the collapsed
graph. Hence P will be a resample point on the piecewise linear original mesh geometry.
Repeating this process for all new vertices allows the perturbing of the subdivision connectivity
mesh to approximate the original geometry. The following examples show the results.
Aaron Lee just graduated from Princeton University with a Ph.D. in Computer
Graphics & Animation. He has interned at various places including Microsoft Research
Lab, NASA Ames Research Center and Bell Labs, Lucent. Currently he works at a
startup company doing streaming synthetic media content delivery. If you are a
2D/3D game developer looking for exciting opportunites to define next generation
interactive content, please don't hesitate to contact him at [email protected].
Last month we discussed some of the performance issues facing Age Of Empires II: The Age Of
Kings (AoK). I described some of the tools that we used at Ensemble to collect that data,
including Intel's VTune, NuMega's TrueTime, and our own profiling code. In this concluding
article, I'll describe how to improve performance by effectively using performance tools and
instrumentation. We'll also look at general types of problems we encountered as we optimized
AoK which can affect any game. Then we'll wrap things up by taking a look at the last bastion of
getting a game to run on the minimum platform when all else fails: scalable features.
All the performance problems AoK encountered fell into one or more of seven general
categories. These problems ranged from executing dead code to inefficient code and they can
affect any game. Let's take a look at these categories.
1. Executing dead or superfluous code. Over the course of a long development cycle, a lot
of code-based functionality is created, changed, and/or discarded. Sometimes discarded or
superceded functionality is not removed from the game and continues to be executed. While it's
a waste of effort to optimize code that should be removed in the first place, it can be difficult to
determine whether a few lines of code, a function, or an entire subsystem is going unused.
One feature we had envisioned for AoK was renewable resources, so natural resources such as
trees would increase over time if they weren't depleted. After play-testing the game, we found
that this feature would often cause a game to last indefinitely, so we eliminated it. Later, when
profiling game performance, we discovered that not all of the code had been removed -- the
code that controlled tree regrowth appeared at the top of our profiler's function list, and we
quickly removed it.
Unfortunately, superfluous code is not always so easily found, and often it's only when the code
gets executed enough that you spot it on a profiling list. Such was the case with another
problem also related to the trees in our game.
In our derived unit hierarchy of classes (described in last month's article), we easily added new
units to the game by deriving new classes in the hierarchy. This hierarchy also is powerful in
that functionality can be added or changed in a single place in the code to affect many different
game units. One such change inadvertently added line-of-sight checking for trees, which is
unnecessary since trees are not player-controlled. This was not an obvious performance
problem and it was found only through logging data and stepping through code while trying to
make the line-of-sight code faster.
2. Executing code too much. Trees, wall segments, and houses were often indicators of
general performance issues in AoK, given the large amount of them on maps -- some AoK maps
contain more than 15,000 trees. In order to process these units quickly, we created shortcuts in
various derived functions within the unit hierarchy to avoid unnecessary unit processing. This
became very complicated in some circumstances, since the computer player uses walls and
houses as part of their line of sight. If it weren't for the differences between the way computer
and human players used these units, the wall and house special processing would have been
simpler. But the player's ability to use the buildings to scan for enemies made our AI processing
simpler and more effective.
Pathing was another system that we spent a lot of time optimizing so that it wouldn't execute
for too long. To do this, we capped the number of times the pathfinding system could be
executed to a fixed number of iterations per unit per game update. When trying to optimize a
pathing system by capping its execution, you have to balance the desire to limit CPU usage with
the desire to not make players think the units exhibit dumb behavior when instructed to move
or attack. This forced us to tweak the game a great deal to achieve the right balance between
playability and speed, but that's often the trade-off you face when optimizing a game.
We tried a variety of caps to optimize the pathfinding system, and it was determined that at
five or more pathing attempts, units attempting to retarget were the most responsive to the
player. Five attempts were too many for the minimum platform, and we decided that two
pathing attempts were too few based on the results of play-testing. We ultimately decided to
cap the number of pathing attempts at three, once again based on our desire to balance
playability with usability.
We also placed execution caps on other systems to improve performance. These included the
number of pathing attempts made by a player's units, the amount of time the computer player
could spend thinking during each game update, and the number of targets a unit could look for
when retargeting.
3. Using inappropriate algorithms. While the pathing system in Age of Empires was a good
general purpose system, it broke down in some specific circumstances (as discussed last
month). Also, there were new performance issues raised by AoK, including a larger number of
units and larger maps to path across.
We could have continued to attempt to optimize the single-pathing system, but it was obvious
from the work performed on AoE that enough requirements had changed so that the algorithm
could no longer stand on its own. What had been a good algorithm for AoE had become an
inappropriate algorithm for AoK due to new and changing pathing requirements.
The AoE pathing system was used to path units from one general area to another over short
distances in AoK. New pathing systems were added to path units quickly across the game map
and to path units accurately within short distances. Also, as part of the new pathing system, a
new unit obstruction manager (see Pottin-
ger in the For More Information section) was added for detecting unit collisions during pathing.
4. Encountering exceptional data. Built for efficiency from the start, the unit obstruction
manager surprised us when it was identified by our performance profilers as one of the top
problems. After reviewing the code to look for obvious (or not obvious) problems, we added
instrumentation code that catalogued how units and their locations were stored within the
quadtree.
With this logging code in place, we quickly saw that the majority of units placed in the quadtree
ended up being not in the leaf nodes, but higher up in the quadtree branches. We also
discovered that units touching the edge of a tile were interpreted as spanning two tiles, which
caused performance problems. By bumping units back onto the proper tiles, we immediately
saw a 300 percent performance boost in obstruction manager performance.
This code, as is most code, was written based on assumptions about the data. Programmers
assume that the data processed by a function is of a certain type and will fall within certain
limits or into certain sets. When data fell outside these expectations, our algorithm -- which
would otherwise have performed well -- was identified as a performance problem.
Some sections of the game were instrumented from the very outset of development to help
diagnose data processing problems that arose frequently in those sections of code. The unit AI,
for instance, contained conditional #define statements to log approximately 50 different sets of
performance information. These performance monitors could be used alone or in various
combinations to help resolve performance issues related to data processing.
5. Inefficient memory usage. Poor performance can be caused by data structures that are
not cache-line aligned, random access to main memory, using too much memory, allocating
memory, and data dependencies. In AoK, memory problems could be especially severe since
multiplayer games can last six hours or more, during which time tens of thousands of units can
be created and destroyed.
To analyze and improve the memory usage of AoK, we used a number of different tools. The
first tool that was a tremendous help was the set of Windows NT performance counters, which
we used to examine memory statistics quickly. The NT performance counters provided a wide
array of data about an application, including processor, process, memory, and network
statistics. In the case of AoK, the most important memory statistic was Private Bytes, the
amount of nonshared memory allocated for the AoK process.
By sampling the memory footprint at specific intervals, we created a general picture of the
game memory footprint (Figures 1a and 1b). Since the game's memory requirements are
effectively the same across Windows NT and Windows 98, the NT performance counters helped
us examine how memory was used during a four-player game on the minimum specified
player's system. This was key to helping us determine if AoK would fit within the minimum
target memory size of 32MB.
Given the minimum system game requirements (Figure 2), we estimated that a game should
typically last about 45 to 60 minutes. In the four-player game example shown in Figure 1a,
about 21MB of memory was allocated by the game upon start up. Thirty minutes into the game,
memory usage rises to around 23MB.
In contrast, look at the memory footprint of the eight-player game shown in Figure 1b. The
addition of more players to the game requires more memory for their data at startup, as well as
more memory to support the larger game map. The amount of memory consumed continues to
grow during the game as more units and buildings are created until a plateau is reached. After
reaching that plateau (not shown), the memory footprint starts tapering back down. The
receding memory footprint occurs as players and units are defeated.
While these high-level memory statistics from the NT performance counters are quick and
useful, often it's necessary to drill down to see which specific functions are allocating memory.
To get that information, we created a simple memory instrumentation system to track memory
allocations (see Listing 1). The memory allocation code tracked allocations and de-allocations
by memory address, number of bytes requested, and file name and line number of the actual
function call. It also provided a running count of the number of allocations and de-allocations,
and the bytes of memory allocated in each game update loop.
The sheer number of memory allocation schemes used in AoK complicated our memory
analysis. AoK uses the C++ new and delete operators; C library malloc, free, and calloc
functions; and Win32 GlobalAlloc, GlobalFree, LocalAlloc, and LocalFree functions. In the future,
we will be actively restricting ourselves to a subset of these functions.
To reduce memory fragmentation and eliminate overhead caused by allocating and de-
allocating memory, memory pooling was used in many subsystems. While this significantly
increased performance, it did create problems when trying to fix bugs where code referred to
recycled data.
After profiling performance and memory usage, it turned out that the most performance-
limiting factor in AoK could be the Windows 95/98 virtual memory system. Unlike Windows
NT/2000, Windows 95/98 doesn't require or configure a fixed-size swap file for virtual memory.
To make matters worse, the swap file can grow and shrink as a program runs. An expert user
can create a swap file of fixed size, but it's not something the vast majority of users can do or
should have to worry about.
AoK relies on the virtual memory system to handle the growing footprint of game data over
time within the game. It also uses multiple read-only memory-mapped files to access game
graphics and sounds residing in large aggregated resource files. These memory-mapped files
ranged in size from 28MB to 70MB. Since the amount of virtual memory available can vary so
widely on a user's Windows 95/98 system, this ended up being the number one AoK
performance issue beyond our control. It should be noted that this virtual memory problem
didn't effect every minimally configured system. Virtual memory problems in Windows 95/98
seemed to occur just on certain systems, even when identically configured systems performed
with little or no problem.
6. Inefficient code. Rewriting inefficient code is likely the most well known performance
optimization, but it was typically the last resort to fix our performance problems. In many
cases, the performance problem was resolved by identifying and fixing one of the previously
mentioned deadly sins.
The easiest place to attempt to improve inefficient code is with the compiler optimization
settings. Due to the size of AoK, we chose to compile release builds with the default "maximize
speed" setting for all modules. This may cause some code bloat (since speed is favored over
size), but in general it's a good choice. We chose not to use "full optimization" since we've seen
few programs that could run after using it.
Since shipping AoK we've been looking at the performance benefits of compiling with "minimize
size" and then using #pragma (or module settings) to optimize specific hotspots for speed. This
seems to be a better trade-off than just using the single speed optimization setting for
everything.
In AoK we chose to use the "only _inline" option in Visual C++, instead of inlining "any suitable"
function. This let us choose which functions to inline based on their appearance in the profile
list. Inlining any suitable function would most certainly increase the code size and lead to slower
performance.
Using an alternate compiler, such as Intel's C/C++ compiler, to optimize one or more
performance-intensive modules is also another way to realize some additional performance
gains. We decided against this for AoK, however, because of the risk associated with changing
compilers (or even compiler versions) near the ship date.
7. Other programs. One of the greatest strengths of Microsoft Windows is its ability to
preemptively run multiple programs at the same time. However, it can be a huge drawback
when programs that the user is unaware of take CPU time away from a game or cause the
game to lock up. For instance, during the play-testing phase of AoK's development, we received
reports of problems that we couldn't reproduce on our own systems. Sometimes these issues
were caused when the game entered an unstable state, but often other programs running in the
background on the tester's computer caused the reported problems.
Virus scanners and other programs spontaneously running in the background while a tester was
playing AoK were the most widespread cause of externally induced performance problems.
Unfortunately, there's no way to easily and adequately interrogate a player's computer and
warn them about potential problems that other programs can cause.
The most severe issue related to other programs involved the Windows Critical Update
Notification. Play-testers sometimes reported input lock ups during game play for no apparent
reason. We accidentally discovered that when AoK was in full-screen mode, the Critical Update
Notification could pop up a dialog box behind AoK. This would take the focus off AoK and make
it appear to players as if the game had stopped accepting input. Changing AoK to handle
situations like this was relatively easy once the problem was identified. Other applications likely
cause similar behavior to occur, but it's only by trial and error that these problems are
identified.
When we had finally squeezed as much performance as we could out of AoK for the minimum
platform, we were still left with a performance deficit in the area of terrain drawing. We couldn't
make the feature optional since players need to see the terrain, yet we couldn't make it any
faster, either. The only alternative was to provide different implementations of terrain drawing
or different levels of terrain detail.
We decided to offer three different terrain detail settings: a fast algorithm with low detail for
the minimum platform, a medium detail (but slower) one for mid-range platforms, and a high
detail (slower still) for high-end platforms. This allowed AoK to run on a lower minimum
platform, but still give the user on the high end additional visual quality to look forward to. This
was a tactic used for a number of features in AoK.
Scalable features least likely to confuse the player are those that fit directly into the context of
the game, such as the number of players or the game map size. In total, there were six
scalable features, four of which fit the game context. In the game, these are not called
"scalable features," but "game options" (Figure 3). These options are as follows:
Number of players. The simplest scalable feature within AoK is the number of computer-
controlled or human opponents or allies that a person chooses play with, which is up to eight
players per game. The more players, the higher the performance required by each player's
system. Up to four human or computer-controlled players can play on a minimum-specified
system.
Map size. Related to the number of players is the size of the map. The map size is expressed in
terms of numbers of players (for example, two-player map, four-player map, and so on). The
number of players supported by a specific map size was determined by the distance that our
game designers felt should be between player starting positions. But map size is independent of
the number of players, so you can have a two-player game on a big eight-player map. This
gives players the choice to accept our recommendations, space themselves out further, or
squeeze in tighter. Based on our choice of four players for the minimum platform, the default
map size is for four players.
Population limit. The unit population cap sets the maximum number of units the player can
build during a game. By default, this value is 75, but it can range between 25 and 200 units.
Again, the user does not see this as scalability, but as a tweakable option for creating the
perfect game. We chose to make 75 units the default population cap because game
performance on the minimum platform degrades too much at the next higher population limit.
Different artwork. In addition to these scalable game options is also an implied, but generally
unrealized, scalability in AoK's art assets. Each of the 13 civilizations in AoK is assigned to one
of four sets of building art. Each building art set represents the area of the world where the
civilization is from. For example, the Japanese use the Asian building art set and the Britons use
the Western European building art set. There are also Eastern European and Middle Eastern art
sets. These art sets not only have their own styles, each also has different styles of buildings for
each of the four evolutionary ages in AoK: Dark Age, Feudal Age, Castle Age, and Imperial Age.
In other words, an Asian Dark Age house looks different from an Asian Feudal Age house, or an
Asian Castle Age house.
This "upgrade" of buildings within each art set as the ages progress creates an interesting
memory allocation curve. In the beginning of the game, all the players use the Dark Age
version of their particular art set. As the game progresses, players advance through the ages at
different rates. Since the advancement to the next Age causes the building style to change for
the player, new art must be loaded and displayed. This increase in memory allocation continues
until all players again reach the same age.
Assuming all players start in the Dark Age and survive to the Imperial Age, the memory
allocation exhibits bell curve behavior. The worst case is when there is a player in each of the
four Ages at the same time, which sometimes happens in the middle of a game. If all the
players in an eight-player game select civilizations from different art sets, they use at least
twice as much memory as if they had chosen civilizations from the same art set.
Display resolution. This, along with the terrain detail (which is explained in a moment), is one
of two scalability options within AoK that is outside the scope of the game's design concept. The
default display resolution is 800x600, and can scale up to 1280x1024. Again, the lowest
resolution was chosen for the minimum system.
Terrain detail. A terrain detail setting was introduced to reduce the amount of processor time
required to draw 2D isometric terrain on slower computers by reducing the visual quality. Three
levels of terrain detail are provided. The terrain highest detail setting uses multiple rendering
passes, anisotropic filtering, and RGB color to bring out the best detail. The medium-detail
setting replaces anisotropic filtering with a lower quality but faster filter; the low-detail setting
uses flat shading and an eight-bit color lookup table similar to the terrain in the original AoE. No
matter which terrain detail setting is used, the final display output only uses 256 colors.
The choice of which level of terrain detail to display is made automatically by AoK the first time
it runs. Since all rendering is performed on the CPU, this decision is made quickly by using a
test that gauges the CPU speed. Players can change the setting later using an in-game menu.
Unfortunately, scalable features that fall outside of the game design (such as the last two
options above) are less likely to be understood by players. This lack of understanding can lead
players to change settings, resulting in a negative impact on their game experience without
them realizing what they did, and leave them unable to restore the original settings.
Key Lessons
After analyzing and improving the performance of AoK, our team learned some essential
lessons that we hope to use to improve the quality and performance of our future products.
We hope it will improve your future games, as well. These lessons are:
//================================================
// memory.h header
//================================================
extern "C"
{
void *mymalloc( size_t size, const char *pfilename,
const long dwline);
void myfree( void *memblock, const char *pfilename,
const long dwline);
};
//================================================
#ifdef _INSTRUMENTMEMORY
#define malloc DEBUG_MALLOC
#define free DEBUG_FREE
#endif
#define DEBUG_MALLOC(size) mymalloc(size, __FILE__, __LINE__)
#define DEBUG_FREE(block) myfree(block, __FILE__, __LINE__)
//================================================
#ifdef _INSTRUMENTMEMORY
void MemoryInit(void);
int MemorySave(void);
void MemoryUpdate(void);
#else
#define MemoryInit
#define MemorySave
#define MemoryUpdate
#endif
//================================================
// eof: memory.h
//================================================
//================================================
// memory.cpp
//================================================
#include <windows.h>
#include <stdio.h>
#include <io.h>
// !!! DO NOT include memory.h header file here !!!
//================================================
static FILE *pmemfile, *pupdatefile;
static bool binitialized = false;
//================================================
static DWORD gdwAllocCount;
static DWORD gdwByteCount;
static DWORD gdwDeletions;
static DWORD gdwFrameCount;
//================================================
void MemoryInit(void);
//================================================
void MemoryUpdate(void)
{
if (pupdatefile)
{
fprintf(pupdatefile, "%lu\t%lu\t%lu\t%lu\n",
gdwFrameCount, gdwAllocCount, gdwDeletions, gdwByteCount);
gdwDeletions = 0;
gdwAllocCount = 0;
gdwByteCount = 0;
gdwFrameCount++;
}
} // MemoryUpdate
//================================================
extern "C" void *mymalloc( size_t size, const char *pfilename, const long dwline)
{
RGEMemoryEntry entry;
gdwAllocCount++;
gdwByteCount += size;
void *p = malloc(size);
if (!binitialized)
MemoryInit();
if (pmemfile)
fprintf(pmemfile, "malloc\t0x%X\t%ld\t%s\t%ld\n", p, size, pfilename, dwline);
return p;
} // mymalloc
//================================================
extern "C" void myfree( void *memblock, const char *pfilename, const long dwline)
{
RGEMemoryEntry entry;
gdwDeletions++;
if (!binitialized)
MemoryInit();
if (pmemfile)
fprintf(pmemfile, "free\t0x%x\t\t%s\t%ld\n", memblock,
pfilename, dwline);
free(memblock);
} // myfree
//================================================
void MemoryInit(void)
{
if (binitialized)
return;
pmemfile = fopen("c:\\memory-alloc.txt", "wb");
pupdatefile = fopen("c:\\memory-update.txt", "wb");
if (pmemfile)
fputs("type\tptr\tbytes\tfilename\tline\n", p);
if (pupdatefile)
fputs("frame\tallocations\tdeletions\ttotal bytes\n", p);
binitialized = true;
} // MemoryInit
//================================================
int MemorySave(void)
{
fclose(pmemfile);
fclose(pupdatefile);
pmemfile = 0;
pupdatefile = 0;
return 0;
} // MemorySave
//================================================
// eof: memory.cpp
//================================================
Herb Marselas currently works at Ensemble Studios. He helped out on Age of Empires
II: The Age of Kings. Shhhh! Please don't tell anyone he's working on a secret 3D-
engine project called [deleted]. Previously, he worked at the Intel Platform
Architecture Lab where he created the IPEAK Graphics Performance Toolkit. You can
reach him at mailto:[email protected]
For more information and awknowledgements please refer to Profiling, Data Analysis,
Scalability, and Magic Numbers: Meeting the Minimum Requirements for Age of
Empires II: The Age of Kings
Because of the problems with creating the texture maps and the computational costs during
runtime, real-time spherical environment mapping is not often used in games. As a result, when
the technique is used, the spherical maps are usually pre-calculated and therefore the don't
reflect changes in a scene as they happen. Fortunately, some DirectX 7-capable video cards
support cubic environment maps, which don't exhibit any of the problems associated with
spherical maps, and thus they're suitable for reflecting dynamic scenes. Despite their limitations
though, spherical environment maps are still useful. Using sphere maps, you can create very
high performance and cheap static reflections which in most cases are good enough for game
reflections, another very useful example is creating realistic specular highlights from an infinite
light source.
This article will show a hardware T&L accelerated method of using sphere maps, it is assumed
that your game will have some level of geometry hardware acceleration in addition to Direct3D
support. If geometry acceleration is not present, applying these techniques may actually slow
down a game (especially if the standard Direct3D software pipeline is used).
To begin our look at spherical mapping let's look at the Spheremap demo, one of the samples
that comes with DirectX 7 (to find this demo, search the DirectX 7 CD-ROM for
SPHEREMAP.EXE). This application displays a spherically environment-mapped teapot. Figure 1
shows a screenshot from this application.
Figure 1. A screenshot from SPHEREMAP.EXE
The Spheremap demo implements what I call "normal" spherical mapping, where the normal
vector at a vertex is used in place of the eye-to-vertex reflection vector. The code that performs
the mapping within this demo is shown in listing 1 (in the DirectX SDK source file this can be
found in a function named ApplySphereMapToObject() ).
Unfortunately, the Spheremap demo has many shortcomings and doesn't implement spherical
reflection mapping like OpenGL (when the automatic texture address generation is set to
GL_SPHERE_MAP). In fact, Direct3D has no sphere map support at all - you have to calculate
the texture coordinates yourself. To do so, you could create a system that cycled through all of
the vertices and have the CPU calculate the texture coordinates, this is what the DirectX7
Spheremap demo does but this is far from efficient.
A closer look
The DirectX 7 documentation and various pieces of literature from the graphics vendors stress
over and over that correctly using and managing vertex buffers is the key to getting high
performance - especially with hardware T&L-based cards. Static vertex buffers are the most
efficient, as they can be kept in local video memory and never updated (i.e. optimized), but
that means that all geometry processing has to be performed with the standard hardware
pipeline, limiting the effects that you can create. Even so, it is surprising what can be done
when all the resources of the pipeline are used.
If you must have dynamic geometry, a carefully managed CPU-modifiable vertex buffer is still
better than no vertex buffer (as in the the SPHEREMAP.EXE example). However, the Spheremap
sample code is one of those pathological cases where vertex buffers are actually slower - if you
converted that code to use video memory vertex buffers, it would most certainly slow down
since the normal is read back from the vertex buffer (which is taboo, as both video memory and
AGP memory are uncached). If the vertex buffer happens to be in local video memory, then it's
being fetched back over the AGP bus, which is painfully slow. In this case, keeping a second
copy of the normal vectors in system memory would be best.
Also, note that there's a glaring mistake in the DirectX algorithm, which I am compelled to point
out. It is the line commented, "Check the z-component, to skip any vertices that face
backwards". Vertices do not face backwards, polygons do; it is perfectly legal for a polygon to
have a vertex normal that points away from the viewer while still having a face normal pointing
towards the viewer:
The results of the erroneous z-component check can be seen in the DirectX 7 example when the
bottom of the teapot comes into view. For a few frames, a number of triangles are not textured
properly. This check is not only an error, it causes the loop to run slower (well, it certainly
doesn't speed it up). Without the check, there would be 2N dot products (where N is the
number of vertices). With the check in place, and assuming half of the vertices face away from
the viewer, there are N+2N/2 = N+N = 2N vertices, so the same amount of work is done. The
difference is that now there is a jump in the middle of the loop in which the CPU has to predict
or mispredict. On a Pentium II or III, a mispredicted jump is far more expensive than a couple
of dot products.
When you have removed the z-component check, all that's left to do in the main loop is
generate texture coordinates. The vector [m11, m21, m31] is the local space +X direction in
camera space and the vector [m12, m22, m32] is local space +Y direction in camera space.
Recall that all normal vectors are points on a unit sphere, so the code generating the texture
coordinates is effectively calculating the longitude and latitude coordinates of the normal
vectors position on that sphere (or the cosines of them) by taking the dot product of the unit
normal with the unit axes (see Figure 2a & 2b). The output of that calculation is scaled and
biased so that the center of the sphere map is the origin:
If we consider that the sphere map UV coordinate calculation requires two dot products, and a
matrix*vector performs four dot products we should be able to perform the same calculation
using a texture matrix. Direct3D supports 4x4 texture matrices at every texture stage so all we
have to make a texture matrix that performs the same dot products as discussed above, also
by carefully creating the texture matrix the scale and bias is automatically performed so the
origin is in the center of the texture map. The required texture matrix looks like the following:
NOTE: In the above math, the vectors lsx and lsy are used in place of [m11, m21, m31] and
[m12, m22, m32] to represent the local space x and y axis in camera space - in other words,
the local space [1,0,0] and [0,1,0] vectors respectively, transformed by the local*world
matrices.
Next, specify the vertex normal as the first three elements of the input texture coordinate
vector, and the forth element will automatically be set to its default of 1. The specified texture
matrix will be applied to the texture coordinates (normal vector) and the resulting texture
coordinated vector identical to that in the DirectX example.
Note: DirectX has no specific naming convention for the elements of a 4D texture coordinate so
I will use the standard of [r, s, t, q]. While performing standard 2D texture mapping 'r'
component is equivalent to 'u', likewise the 's' component is 'v' and elements 't' and 'q' are
unused.
The following code sets the above texture matrix at stage 0. This operation needs to be done
any time either the world or local matrices change, as LocalToEyeMat = Local*World:
3DDevice->SetTransform(D3DTRANSFORMSTATE_TEXTURE0, &tex_mat);
There is one additional render state that needs to be set. You must tell Direct3D to apply the
texture matrix and to use just the first two elements of the result:
3DDevice->SetTextureStageStat ( 0, D3DTSS_TEXTURETRANSFORMFLAGS,
D3DTTFF_COUNT2 );
Direct3D has no way of specifying that the untransformed normal should be used as input into
the texture matrix. The quick fix for this is to create a flexible vertex that has a position, normal
and a three-element texture coordinate, and when the buffer is filled, you copy each normal
vector into the texture coordinate. Unfortunately, this also increases the size of each vertex by
12 bytes and consumes more bandwidth when processing the buffer. (In a basic vertex case,
these extra 12 bytes increases the vertex size by 50%.) But the cost is worth it: you can
perform the "normal" spherical environment mapping (as used in the Direct3D sample) with a
static vertex buffer, using nothing more than the standard Direct3D pipeline. This is a big win
with hardware, since cards like nVidia's GeForce and GeForce2 process the texture matrix in
hardware without CPU intervention, allowing the vertex buffer to be stored in local video
memory.
Note that both the Direct3D and texture matrix examples expect a unit scale in the local-to-
camera space transform (local*world). If this isn't the case, the texture matrix must be scaled
by the inverse of the scale factor. Additionally, the normal-vector texture coordinates are
expected to be of unit length. If this technique is applied to dynamic geometry, then every time
a normal is modified, the associated texture coordinate needs to be updated. Another
shortcoming of the method discussed above is that only the original input normal vectors are
considered when calculating the reflection which for most meshes is fine but when mesh
skinning is applied there is a problem. When skinning a mesh in hardware each vertex (position
and normal) is multiplied by a pair of world transforms, the final position and normal is
calculated from a weighting applied to the results of these transforms. This skinned normal and
position is not available outside of the graphics pipeline but to obtain a correct reflection we
need to know what the skinned normal vector was but we have a problem, one solution would
be to use the CPU to reskin the mesh but this is expensive.
Fortunately there is a better solution, Direct3D can be told to use the camera-space normal
directly as a texture coordinate by setting the texture coordinate index for the required texture
stage to include the D3DTSS_TSI_CAMERASPACENORMAL flag. With this 'Tex-Gen' mode set
any texture coordinates within the vertex buffer that are bound to the specified texture stage
are ignored and instead the camera-space normal [n'x,n'y,n'z,1] is used, this normal vector
automatic texture coordinate includes any skinning operations that may have been performed.
Contrary to popular belief, this flag is not just for use in cubic environment mapping, it can be
used anytime you want the camera space normal to be used as a texture co-ordinate.
Referring back to the texture matrix, it is easy to create a texture matrix to take advantage of
this new flag, as shown below:
where [n'x n'y n'z 1] is the transformed and skinned normal vector.
As you can see, all that needs to be done is multiply the transformed automatic normal-vector
texture coordinate by a constant matrix. The first advantage of this method is that you do not
have to update the texture matrix when the local-to-camera matrix changes (since this is
already taken care of with the transformed normal). Another advantage is that you do not need
to keep a second copy of the normal vectors within the mesh - the normal seen by the texture
matrix is the same one seen by the geometry. That means that automatic normal normalization
and skinning can be utilized. When automatic normalization is enabled, you can even set non-
uniform scale factors in the world matrix and everything works fine.
The other big gain is that it is possible to perform reflective spherical environment mapping,
since Direct3D can pass the camera-space reflection vector in place of the normal by setting the
texture index state to include D3DTSS_TSI_CAMERASPACEREFLECTIONVECTOR. This provides
environment mapping similar to that produced by GL_SPHERE_MAP, which looks significantly
better than normal-based spherical environment mapping. (Unfortunately, the reflection
method is more sensitive to distortion due to bad normal vectors, non-unit normal vectors and
low tessellation.)
There are two methods that Direct3D can use to calculate the eye-to-vertex reflection vector.
The method used depends on the local-viewer render state
(D3DRENDERSTATE_LOCALVIEWER). When this render state is set to true, the per-vertex
refection is calculated using this formula:
R = 2(E.N)N-E
(Where N is the vertex normal and E is the camera to vertex unit vector.)
When the local-viewer render state is set to false, the reflection vector is orthogonal, and is
calculated from an infinite viewpoint using this more simple formula:
R=2NzN-I
(Where Nz is the world space Z component of the vertex normal and I is the vector [0,0,1].)
Better reflections are obtained with the local viewer model, but it is a more complex calculation
and performance can be affected by the level of hardware acceleration; try using both reflection
models and see which one works the best for you. By default within Direct3D, local viewer
reflections (and specular calculations) are enabled, and they should be disabled if the
orthogonal non-local version is required. The screen shot below shows the test application
associated with this article performing spherical reflection mapping on a torus:
Finally, depending on the quality of your sphere maps, you might want to pull the boundary in a
little (but not too much, as it creates other distortions). By not using the last few pixels around
the edge of the sphere map, you can significantly change the look of an object. Another way to
look at the texture matrix is as follows:
Where both Scale and Offset are both in u,v coordinates, in the range 0.0 to 1.0. By carefully
adjusting this matrix you can not only trim off the outer edge of a sphere map, you can also
pack more than one sphere map into a larger texture or even use elliptical sphere maps.
The code associated with this article is available as a Visual C++ 6.0 project call TGReflect,
included in the archive is a ready to use pre-built release mode executable. This project was
built using the Shrike 7.00 DirectX framework, this framework is only required if you wish to
rebuild the project and is available from this link, see Shrike7.doc within the archive for more
information.
Listing 1. Mapping
Rob Wyatt has been involved in games and graphics for more than a decade and was
one of the architects of the X-Box game console, he recently left Microsoft and headed
to Southern California where he can be found flying around the skies of Los Angeles in
his plane. He is currently looking a various technologies for the Internet.
URL: http://www.gamasutra.com/features/20000809/marselas_01.htm
Age of Empires II: The Age of Kings (AoK), a tile-based, 2D isometric, real-time strategy game,
was built on the code base used in the original Age of Empires (AoE) and extended in its Rise of
Rome expansion pack. In these games, players guide one of many civilizations from the humble
beginning of a few villagers to an empire of tens or hundreds of military and non-military units,
while competing against other human or computer-controlled opponents in single or multiplayer
modes.
This is the first of a two-part article that describes the tips, tricks, tools, and pitfalls that went
into raising the performance profile of Age of Empires II: The Age of Kings. All of the techniques
and tools used to measure and improve AoK are fully capable of improving the performance of
other games.
Our team had some ideas as to which parts of the code were taking a long time to execute, and
we used Intel's VTune, NuMega's TrueTime, and our own profiling code to verify these hunches
and see exactly where time was being spent during program execution. Often these
performance results alone were enough to determine solutions but sometimes it wasn't clear
why the AoK code was underperforming, and in these cases we analyzed the data and data flow
to determine the nature of the problem.
Once a performance problem is identified, several options are available to fix it. The most
straightforward and recognized solution is direct code optimization by optimizing the existing C
code, translating it to hand-coded x86 Assembly, rearranging data layouts, and/or
implementing an alternative algorithm.
Sometimes we found that an algorithm, though optimal for the situation, was executing too
often. In one case, unit pathing had been highly optimized, but it was being called too often by
other subsystems. In these cases, we fixed the problem by capping the number of times the
code could be called by other systems or by capping the amount of time the code could spend
executing. Alternately, we might change the al-gorithm so its processing could occur over
multiple game updates instead of all at once.
We also found that some functionality, no matter how much we optimized it, still executed too
slowly. For example, supporting eight players in a game required too much processor time on
the minimum system, so we specified that the minimum system could support only four
players. We presented scalability features such as this as facets of game play or as options that
players could adjust to their liking. These scalable features ultimately allowed AoK to run well
on its stated minimum system, providing incentives or rewards to users who have better
computers.
And then there were AoK's approximately 30 single-player scenarios. We evaluated the
performance of these scenarios slightly differently from other game functionality. Instead of
trying to optimize offending code, we first examined the scenario for performance problems that
had been inadvertently introduced by the scenario designers in their construction of the
scenario and it's elements. In many cases, performance improved significantly with slight
changes to the scenario, for example reducing the number of player units, shrinking the game
map, or making sections of the maps inaccessible to players. Above all, we made sure that we
did not change the designer's vision of the scenario as we optimized it.
One of the goals of AoK was to keep the system requirements as low as possible. This was
necessary in order to reach the broadest audience possible and to stay on the same incremental
processor performance ramp set by the original Age of Empires and its Rise of Rome expansion
pack. Our overriding concern was to meet these minimum system requirements yet still provide
an enjoyable game experience.
The original Age of Empires was released in September 1997 and required a 90MHz Pentium
processor with 16MB RAM and a 2D graphics card capable of handling 8-bit palletized color. The
Rise of Rome expansion pack shipped a year later and raised the minimum system processor to
a 120MHz Pentium. Based on this information, the AoK minimum processor was pegged as a
133MHz Pentium with 32MB of physical RAM (Figure 1). The additional RAM was required due
mainly to the increased size and number of graphics and sound files used by AoK. There was
also a greater amount of game data and an executable that grew from approximately 1.5MB for
AoE to approximately 2.4MB for AoK.
To make sure AoK worked on the minimum system, we had to shop for old hardware. We
purchased systems matching the minimum system specification from a local system reseller -
we no longer used systems that slow. When the "new" computers arrived, we decided not to
wipe the hard drives, nor did we reinstall software and hardware with the latest driver versions.
We did this because we expected that most AoK users wouldn't optimize their computer's
configuration or settings, either. Optimizing these systems would have undoubtedly improved
our performance numbers, but it would not have translated into true performance gains on
other minimally-configured computers. On the other hand, for normal in-house play-testing we
used computers that were significantly more powerful than the minimum system configuration,
which made up for performance issues caused by unoptimized code and enabled logging
functions during play-testing (Figure 1).
A precedent set by the original Age of Empires was the use of options and settings playable on
the minimum system (Figure 2). A list of the specific options supported by the minimum system
was needed due to the large number of them available in AoK (Figure 3). These were also the
default options for the single-player and multiplayer games, and were used to guide the
creation of approximately 30 single-player scenarios.
One of the first tasks of this dedicated performance phase was to determine the largest
performance problems, the improvements that we could hope to make, and the likelihood that
AoK would meet the minimum system specification in terms of processor and physical memory.
This initial profiling process led us to increase the minimum required processor speed from 133
to 166MHz. We also felt that meeting the 32MB memory size could difficult but we were fairly
certain that the memory footprint could be reduced enough to meet that goal.
No matter how good or bad a program looks when viewed through the lens of profiling
statistics, the only true test of satisfactory performance is how players feel about their game
experience. To help correlate player responses with game performance in AoK, we used several
on-screen counters that displayed the average and peak performance. Of these counters, the
ones that calculated the average frame rate and lowest frame rate over the last several
hundred frames were used most to determine performance problems. Additional statistics
included average and peak game simulation time (in milliseconds) over the last several hundred
game updates.
Identifying symptoms of play-testing performance problems and making saved games of these
problem situations was very useful. We replayed saved games in the profiler and routines that
took too long could be identified quickly. Unfortunately, some problems were difficult to track
down, such as memory leaks and other programs running on the play-tester's computer.
We also created scenarios that stressed specific situations. For instance, we stressed the terrain
engine's hill-drawing by using a special scenario consisting of a large game map covered mostly
with hills. Other special scenarios were created that included many buildings, walls, or attempts
to path units long distances between difficult obstacles. These scenarios were easy to build and
it was obvious the first time the scenario was run whether a given issue needed to be targeted
for optimization.
The final set of data to came in the form of recorded AoK games. AoK has a feature that allows
human or computer player commands to be recorded to a file. This data can then be played
back later as if the original player were issuing the commands. These recorded games helped
diagnose pathfinding problems when it was unclear how a unit had arrived at a particular
destination.
Since AoK was able to load scenarios, saved games, and recorded games from the command
line, the game could be run automatically by a profiler. This simplified the profiling process by
allowing the profiler to run AoK and have it jump directly into the problem. This command-line
process bypassed the startup and pregame option screens. (Some profilers slowed the game
down so much that manually loading a saved game from the profiler would have been
impossible.) And since performance profiling and logging significantly slowed game play,
analyzing recorded games was a much better solution from the tester's perspective. Multiplayer
games could be recorded and then played back command-for-command under the profiler
overnight to investigate performance issues.
Some performance issues from AoE needed to be resolved while we were working on AoK, the
biggest of which was AoE's 2D graphics pipeline. The graphics for AoK are created through a
combination of software rendering and hardware composition. This pipeline had been highly
optimized for AoE by hand-coding most of the system in Assembly, so there was not much
additional need to optimize it for AoK.
But there were new features to integrate into the 2D pipeline. For one thing, AoK had more
detailed terrain. Also, units that were visually obscured behind buildings and other obstructions
appeared as outlines so players could see where they were. Both of these systems were
implemented as a mixture of C/C++ and hand-coded Assembly during implementation.
The biggest challenge in keeping the performance up for the graphics system was making sure
that the sprites used for graphics in the game were properly tagged as belonging in system
memory or video memory. If a sprite was in the wrong memory type a significant performance
hit or even an error could occur, but it was usually hard to identify these graphics memory
location problems. They were usually marked by a drawing problem on-screen, such as a
shadow drawing on top of a unit instead of under it.
Sprites used by the software rendering engine needed to be in system memory so that they
could be read and processed. If they resided in video memory instead, the limited throughput
from video memory caused a significant performance hit to the game. Conversely, sprites bltted
by the hardware that accidentally ended up in system memory would render slowly and could
fail to render at all if the hardware bltter didn't support blts from system memory.
Pathfinding problems from AoE also had to be fixed. In AoE, there was a single unit-pathing
system, which was known as "tile pathing" because it broke the game map down into individual
tiles and classified them as "passable" or "nonpassable." This tile-pathing system was fairly
good at moving units short distances, but it often took too long to find paths (if it could find one
at all), so we created two additional pathing systems for AoK.
The first of these two systems, "MIP-map pathing," quickly approximated distant paths across
the map. The basis for MIP-map pathing was the construction of compact bit vectors that
described the passability of each tile on the map. This system allowed the game to determine
quickly whether it was even possible for a unit to get from its current location to the general
target area. The only way to determine whether the general area could be reached was through
the resolution of the bit vectors.
Once a unit was within a short distance of its target area, another new pathing system called
"low-level pathing" was used. Low-level pathing allowed very accurate pathing over short
distances. When low-level pathing failed, the pathing system fell back and used the original tile
pathing from AoE.
Changing the pathing system from a single, general-purpose system to three special-purpose
systems improved the performance of AoK and also significantly improved game play since it
virtually eliminated the problem of stuck and stopped units caused by pathing failures.
While we were able to improve the pathing system for AoK, enhancing the unit-class hierarchy
system was a much more onerous task. The unit-class hierarchy system from AoE couldn't be
changed easily since so many game systems and so much functionality relied on the old
implementation. At its heart, the game's unit-class system is a hierarchy of derived classes and
each derived class is more specialized than its parent. The special functions of each derived
class are supported by virtual functions exposed by the classes in the hierarchy. A simplified
version of the hierarchy is shown in Figure 4.
From a programming standpoint, calling a virtual function consumes no more overhead than a
regular class function.
If each class could implement only its own version of the virtual functions, then this hierarchy
wouldn't cause any function overhead problems. However, since each level of the hierarchy
implements its own special code, it must also call its parent's version of the derived function to
perform its work. In a hierarchy four classes deep, that means calling three additional
functions. This may not sound like much, but it can add up when code is executed hundreds of
thousands or millions of times.
Some performance improvement could have be gained by circumventing the hierarchy using
"special case" units. For example, walls are a type of building unit that do not attack other units
and only need to scan their vicinity for enemy units every few game updates unless they are
under attack. To handle this special case, we could specifically check whether the current unit
being processed is a wall, and if so, skip the code that is only executed for other buildings.
Unfortunately, coding in too many special cases can also lead to performance losses, because
you end up checking to see whether a unit is one of your many special cases. In the end, we
left unit-class hierarchy in place, and made specific changes to shortcut to functionality that
didn't apply to specific units.
Commercial Profiling Tools: The Good, the Bad, and the Ugly
Performance analysis extends beyond evaluating the execution speed of program functions and
subsystems. It also includes measuring memory usage and evaluating the way the program
interacts with other programs and the operating system. In order to determine where the
performance problems were in AoK, four separate tools were used: Intel's VTune, NuMega's
TrueTime, the Windows NT performance counters, and our own profiling and memory
instrumentation code.
Although we used Microsoft Visual C++, we did not use the bundled Microsoft Profiler. There
were two reasons for this: we found it difficult to get the Microsoft product to work correctly (or
at all) and the data format from their profiler was either inadequate or needed post-processing
in a spreadsheet to be minimally useful. Using VTune, TrueTime, and the NT performance
counters we were able to collect, analyze, and present data in a reasonable fashion.
VTune is a sampling profiler, which means it has a component that wakes up every few
milliseconds (or whatever amount of time you specify) and looks at what processes are
executing on the CPU(s). When you decide enough time has elapsed, you can stop VTune and
look at the statistics it produces for each process executed during that time. If you've compiled
your program with debug information, VTune can display which lines of code were called and
what percentage of the elapsed time was consumed by the executing code.
VTune is great because you don't need to compile a special version of your program, it doesn't
slow your program down while it runs, and it lets you see the amount of time the CPU spent
executing processes besides your own. The only major drawback is that you can end up with
spurious data due to this sampling. This can be caused by other processes that are running in
the system, or by running VTune for too long a period. To improve VTune's accuracy on your
own program, it comes with an API to turn VTune on and off programmatically. This is a very
useful feature, especially when drilling down into the performance of specific subsystems and
smaller sections of code.
We found that VTune's call-graph functionality couldn't be used with a program that linked
either explicitly or implicitly with DirectDraw. Also, some applications (including AoK) were too
large in terms of code and debug information in order for VTune to resolve its data back
correctly to a line of code. It seems that some of these problems have been fixed in VTune 4.5,
however.
Another commercial product that we used was NuMega's TrueTime, which is an instrumenting
profiler. To use this product, you have to make a special TrueTime compilation of your program
that inserts timing code into each module. This can sometimes be a slow build process, but it's
worth it. As the TrueTime build of your program runs, TrueTime logs which functions are
entered, when they are entered, and when they are exited. This process can be significantly
slower than VTune's effectively real-time performance but it's a useful second opinion
nonetheless. The only big drawback (and it can be very severe) is that TrueTime can slow down
your program so much that it's impossible to use it for profiling network code. This problem can
also skew profiling statistics for time-based game actions such as AI or updates that are
scheduled to occur at a certain interval of time.
This performance hit from TrueTime also made it impractical to use it to analyze the
performance of the graphics subsystem. When system performance relies on two independent
processors (such as the main CPU and the graphics card), efficient cooperation between both
processors is critical so that they run concurrently and perform operations in parallel. When
TrueTime slowed the CPU (and consequently the AoK rendering load which the CPU governed),
it made the graphics card appear to give better performance than it actually did.
There were four drawbacks to both programs. First, neither program can be run in batch mode,
so the programmer has to baby-sit the programs while they run through each performance test
case. Even though we worked on performance test cases one at a time, it would have been
convenient to run each program in batch mode overnight to gather results from other test
cases. VTune has since added a batch interface in version 4.5 but support is still lacking in
TrueTime.
Second, performance numbers gathered during the execution of a program need to be taken
with a grain of salt. Due to the multi-threaded nature of the Windows operating system, other
programs (including the performance tool itself) are effectively running at the same time, and
that can skew performance. Fortunately, multiple performance runs with the same tool or with
different tools can help identify specific problem areas by correlating all of the results, and
analyzing performance over smaller sections of code can improve accuracy and reduce the time
required by some performance tools.
The third drawback to these profilers is that it's difficult to use both TrueTime and VTune
together when using Visual C++ as your development environment. TrueTime cannot
instrument code from Visual C++ with VTune installed because VTune renames certain
underlying compile and link programs.
Finally, although both tools display call graphs, we found it difficult at times to ascribe
performance timings to specific subsystems. For instance, pathing was sometimes called from
both movement and retargeting code, but we were not able to determine which subsystem was
using more of the pathing code. TrueTime was generally accurate about this, but in some cases,
the numbers it reported just didn't seem to add up. In this type of case, we had to place our
own timing code directly into AoK to get complete results.
Regardless of how good today's profiling tools are, they have no understanding of or insight into
the underlying program they profile; profiling and analysis tools would be significantly more
useful if they had some understanding of what the application was attempting to accomplish.
With that kind of additional functionality, these tools could provide performance statistics that
would greatly enhance the programmer's ability to improve the application performance. Until
that day arrives, you'll have to add profiling and analysis code to your application for even the
most mundane performance information beyond simple timings and call graphs.
Since performance statistics can change based on the platform on which the application is
running, it was especially critical to get computer systems that matched the minimum system
specification. To demonstrate this performance differential and the scalability of AoK, two test
cases were run on the minimum system configuration and one was run on a regular
development workstation (Figure 5). To contrast the data as much as possible in this example,
the first test case uses the maximum AoK settings for players (eight) and map size (giant). The
second test case conforms to the game settings for the minimum system configuration: four
players on a four-player-sized game map.
Using VTune, the percentage of CPU clock cycles spent in each process during an AoK game
was calculated for a 60-
second period at 30-minute intervals. This was done on the 166MHz Pentium minimum system
(Figure 6), and on a dual 450MHz Pentium III development workstation (Figure 7).
Figure 6 (left). Four-player and eight-player game CPU
process utilizatin (Pentium-166).
Figure 7 (right). Four-player and eight-player game
CPU process utilization (dual Pentium III-450).
As you can see, the four-player game performs well on the 166MHz Pentium. The AoK process
starts at approximately 60 percent of the CPU and increases to about 75 percent after 30
minutes. The additional time devoted to the virtual memory manager (VMM) process at startup
is caused by AoK data as it is paged in and accessed for the first time. In contrast, the amount
of CPU time used by AoK in the eight-player game degrades over time. This is due to the
additional memory requirements to support so many players and such a large game map. The
CPU reaches the point where it's spending almost as much time managing the virtual memory
system as actually executing the game itself.
Since the development workstation (Test PC 2) is a dual-processor system and AoK is single-
threaded, the second CPU is idle as the kernel runs. This is why the NTOSKRNL is shown as
approximately 50 percent of the CPU.
As both the four- and eight-player games progress, the AoK process continues to use more and
more of the CPU. There is no downward pressure being applied from other processes as there
was for the 166MHz Pentium for eight players.
If it had not already been established that four players was the number of players to support on
the minimum system, these same statistics could have been collected for a varying number of
players. Then we could have set the maximum number of players based on how many players
could fit within the memory footprint of 32MB.
To complement and augment the results from the commercial profilers we were using, we
developed an in-house profiling tool as well. This tool logged the execution time of high-level
game systems and functions (telling us how much time was spent by each one) and told us the
absolute amount of time a section of code took to execute - a sanity check for performance
optimizations that we sorely needed. Our profiling system consisted of four simple functions
that were easily inserted and removed for profiling purposes and relied on a simple
preprocessor directive, _PROFILE, that compiled the profiling code in or out of the executable.
This let us keep our profiling calls in the code, instead of forcing us to add and remove them to
create nonprofiled builds. You can download an abbreviated example of the profiling code from
the Game Developer web site (http://www.gdmag.com/).
While VTune told us how much of the CPU AoK was using (Figure 6), our custom profiler told us
how much time was being spent on each of AoK's major subsystems (Figure 8). This additional
information told us interesting things about the performance of AoK and where we might be
able to improve performance. You can see in Figure 8 that the amount of time devoted to game
world simulation and unit AI increases from approximately 33 percent to approximately 57
percent of the AoK process over the course of three samples at 30-minute intervals during an
eight-player game.
Our timing code relies on the Assembly instruction ReadTimestampCounter (RDTSC), but it
could also have used the Win32 QueryPerformanceCounter, or another fine-grained counter or
timer. We chose RDTSC because it was simple to use, it works on all Pentium (and later)
processors (except some very early Cyrix Pentium-class parts), and these profiling functions
were based on extending existing code.
Finally, although both tools display call graphs, pathing was sometimes called from both
movement and retargeting code but we were not able to determine which subsystem was using
more of the pathing code. TrueTime was generally accurate about this, but in some cases the
numbers it reported just didn't seem to add up. In this type of case, we had to place our own
timing code directly into AoK to get complete results.
As I stated earlier, it was difficult to assign performance timings to specific subsystems based
on the results of the commercial profilers that we used. To remedy this, we built functionality
into our custom profiler to determine how much of each system's time was spent in, say,
pathing. Here's how our profiler works. During profiler initialization (_ProfInit), the static array
of profiling information (groupList) is initialized to zero, and the CPU frequency is calculated.
The size of the groupList array matches the number of profile group entries in the ProfileGroup
enum list in the prof header file. The CPU frequency is calculated with a simple, albeit slow,
function called GetFrequency. (Alternately, this could have used one of the specific CPU
identification libraries available from Intel or AMD, but this code works transparently on
Windows 95/98 and NT and across processors without problems.)
The final part of initialization seeds each groupList array entry with its parent group. Since the
groupList array entries match the ProfileGroup enums in order, the ProfileGroup enum can be
used as an index into the groupList array to set the parent group value. Using the
SetMajorSection macro significantly simplifies this code by encapsulating the array offset and
assignment. More importantly, it allows us to use the stringizing operator (#) to record the
parent group's ProfileGroup declaration as a string (const char *) for use when formatting our
output.
The second requirement for our custom profiler was that its profiling code had to be smart
enough to make sure that the profiling start (_ProfStart) and stop (_ProfStop) statements were
inserted around a function or group of functions in correct pairings. The _ProfStop function first
makes sure that profiling was started, and at that point the current time is recorded. This is
then used to calculate and store the elapsed time. The number of calls made is incremented,
and the starting time is reset to zero. We wanted to avoid the situation where profiling starts
multiple times on the same group, or where a _ProfStop appears before its corresponding
_ProfStart. To ensure the correct pairing of profiling statements, in _ProfStart a check is made
to ensure that the function has not already been called by examining the starting timing value
mqwStart. The current time is then recorded into mqwStart using the GetTimeStamp function, a
wrapper for RDTSC.
In GetTimeStamp, it should be noted that the eax and edx registers are used for returning the
current 64-bit timing value as two 32-bit values, which are subsequently shifted and combined.
In this case, there is no need to push and pop the scratch registers since the compiler is smart
enough to recognize the inline Assembly use. However, if this timing code was encapsulated in
a macro, there's the chance that the compiler might not recognize it and it would be necessary
to push and pop the registers.
Another issue we confronted with our custom profiling system was the accuracy and resolution
of timing available from a system that uses two function calls from the calling code (first to
_ProfStart and then to GetTimeStamp). Since we use this timing code to profile larger
subsystems and functions, there will be timing variations due to system factors, such as the
execution of other processes by the operating system. If we time significantly smaller portions
of code, down to a few lines, it's preferable to inline the RDTSC call or use it within a macro.
Using the RDTSC as a high-resolution timer can present another problem, too. Note that RDTSC
is not an instruction that will serialize the execution inside the CPU. In other words, RDTSC can
be rescheduled just like any other CPU instruction and may actually be executed before, during,
or after the block of code you're attempting to time. Using a fencing (serializing) instruction like
CPUID can solve this.
At the end of the program, the _ProfSave function saves the recorded profiling information out
to a file. The name of the group, the number of calls, the elapsed time spent in the group, the
average time per call, its percentage of its parent group, and the parent group name are listed
for each profile group. This output is formatted out using the complicated proftrace macro,
which once again uses the stringizing operator (#) to print out the character version of the
profile group followed by its information.
Next month we'll wrap up talking about our profiling tools by discussing the memory
instrumentation we created for AoK. Then, we'll take an in-depth look at a number of
performance issues facing AoK, including unit movement and pathing, and see how they were
addressed.
Ensemble
http://www.ensemblestudios.com/
Studios
Intel VTune and
http://developer.intel.com/vtune
C/C++ Complier
MicroQuill
HeapAgent and http://www.microquill.com/
SmartHeap
NuMega
http://www.numega.com/
TrueTime
Baecker, Ron, Chris DiGiano, and Aaron Marcus.
"Software Visualization for Debugging." Communications
of the ACM (Vol. 40, No. 4): April 1997
Performance Marselas, Herb. "Don't Starve that CPU! Making the Most
Analysis and of Memory Bandwidth." Game Developers Conference
Tuning Proceedings, 1999.
Acknowledgements: Creating and optimizing AoK was a team effort. I'd like to thank
the AoK team, and specifically the other AoK programmers, for help in getting the
details of some of that effort into this article. I'd also like to thank everyone at
Ensemble Studios for reviewing this article.
Herb Marselas currently works at Ensemble Studios. He helped out on Age of Empires
II: The Age of Kings. Shhhh! Please don't tell anyone he's working on a secret 3D-
engine project called [deleted]. Previously, he worked at the Intel Platform
Architecture Lab where he created the IPEAK Graphics Performance Toolkit. You can
reach him at [email protected].
One aspect of advanced rendering we haven't discussed yet is stenciling, a technique that can
be useful for developing commercial applications. If you want your 3D applications to stand
apart from the crowd, you'd be wise to combine stenciling with the texturing techniques you
learned about in earlier chapters. This chapter will detail how to use stenciling and show you
the different types of effects you can generate with it.
Many 3D games and simulations on the market use cinema-quality special effects to add to
their dramatic impact. You can use stencil buffers to create effects such as composites, decals,
dissolves, fades, outlines, silhouettes, swipes, and shadows. Stencil buffers determine whether
the pixels in an image are drawn. To perform this function, stencil buffers let you enable or
disable drawing to the render-target surface on a pixel-by-pixel basis. This means your
software can "mask" portions of the rendered image so that they aren't displayed.
When the stenciling feature is enabled, Microsoft Direct3D performs a stencil test for each pixel
that it plans to write to the render-target surface. The stencil test uses a stencil reference
value, a stencil mask, a comparison function, and a pixel value from the stencil buffer that
corresponds to the current pixel in the target surface. Here are the specific steps used in this
test:
1. Perform a bitwise AND operation of the stencil reference value with the stencil mask.
2. Perform a bitwise AND operation on the stencil-buffer value for the current pixel with the
stencil mask.
3. Compare the results of Step 1 and Step 2 by using the comparison function.
By controlling the comparison function, the stencil mask, the stencil reference value, and the
action taken when the stencil test passes or fails, you can control how the stencil buffer works.
As long as the test succeeds, the current pixel will be written to the target. The default
comparison behavior (the value that the D3DCMPFUNC enumerated type defines for
D3DCMP_ALWAYS) is to write the pixel without considering the contents of the stencil buffer.
You can change the comparison function to any function you want by setting the value of the
D3DRENDERSTATE_STENCILFUNC render state and passing one of the members of the
D3DCMPFUNC enumerated type.
Before creating a stencil buffer, you need to determine what stenciling capabilities the target
system supports. To do this, call the IDirect3DDevice7::GetCaps method. The dwStencilCaps
flags specify the stencil-buffer operations that the device supports. The reported flags are valid
for all three stencil-buffer operation render states: D3DRENDERSTATE_STENCILFAIL,
D3DRENDERSTATE_STENCILPASS, and D3DRENDERSTATE_STENCILZFAIL. Direct3D defines
the following flags for dwStencilCaps:
z D3DSTENCILCAPS_DECR Indicates that the D3DSTENCILOP_DECR operation is supported
Direct3D embeds the stencil-buffer information with the depth-buffer data. To determine what
formats of depth buffers and stencil buffers the target system's hardware supports, call the
IDirect3D7::EnumZBufferFormats method, which has the following declaration:
HRESULT IDirect3D7::EnumZBufferFormats (
REFCLSID riidDevice,
LPD3DENUMPIXELFORMATSCALLBACK lpEnumCallback,
LPVOID lpContext
);
Parameter Description
A reference to a globally unique identifier (GUID) for the device
riidDevice
whose depth-buffer formats you want enumerated
The address of a D3DEnumPixelFormatsCallback function you
IpEnumCallback
want called for each supported depth-buffer format
IpContext Application-defined data that is passed to the callback function
If the method succeeds, it returns the value D3D_OK. If it fails, the method returns one of
these four values:
z DDERR_INVALIDOBJECT
z DDERR_INVALIDPARAMS
z DDER_NOZBUFFERHW
z DDERR_OUTOFMEMORY
The code in listing 1 determines what stencil buffer formats are available and what operations
are supported and then creates a stencil buffer. As you can see, this code notes whether the
stencil buffer supports more than 1-bit -- some stenciling techniques must be handled
differently if only a 1-bit stencil buffer is available.
HRESULT IDirect3DDevice7::Clear(
DWORD dwCount,
LPD3DRECT lpRects,
DWORD dwFlags,
D3DVALUE dvZ,
DWORD dwStencil
);
Parameter Description
dwCount The number of rectangles in the array at lpRects.
An array of D3DRECT structures defining the rectangles to be cleared. You
can set a rectangle to the dimensions of the render-target surface to clear
IpRects the entire surface. Each of these rectangles uses screen coordinates that
correspond to points on the render-target surface. The coordinates are
clipped to the bounds of the viewport rectangle.
Flags indicating which surfaces should be cleared. This parameter can be
dwFlags
any combination of the following flags, but at least one flag must be used:
D3DCLEAR_TARGET Clear the render-target surface to the color in the
dwColor parameter. D3DCLEAR_STENCIL Clear the stencil buffer to the
value in the dwStencil parameter.
D3DCLEAR_ZBUFFER Clear the depth buffer to the value in the dvZ
parameter.
D3DCLEAR_ZBUFFER Clear the depth buffer to the value in the dvZ
dwColor
parameter..
A 32-bit RGBA color value to which the render-target surface will be
dvZ
cleared.
The new z value that this method stores in the depth buffer. This parameter
dwStencil can range from 0.0 to 1.0, inclusive. The value of 0.0 represents the nearest
distance to the viewer, and 1.0 represents the farthest distance.
The integer value to store in each stencil-buffer entry. This parameter can
range from 0 to 2n-1 inclusive, in which n is the bit depth of the stencil
buffer.
The IDirect3DDevice7::Clear method still accepts the older D3DCLEAR_TARGET flag, which
clears the render target using an RGBA color you provide in the dwColor parameter. This
method also still accepts the D3DCLEAR_ZBUFFER flag, which clears the depth buffer to a depth
you specify in dvZ (in which 0.0 is the closest distance and 1.0 is the farthest). DirectX 6
introduced the D3DCLEAR_STENCIL flag, which you can use to reset the stencil bits to the value
you specify in the dwStencil parameter. This value can be an integer ranging from 0 to 2n-1, in
which n is the bit depth of the stencil buffer.
You control the various settings for the stencil buffer using the IDirect3DDevice7::
SetRenderState method. Listing 2 shows the stencil-related members of the
D3DRENDERSTATETYPE enumerated type.
The D3DSTENCILOP enumerated type describes the stencil operations for the
D3DRENDERSTATE_STENCILFAIL, D3DRENDERSTATE_STENCILZFAIL, and
D3DRENDERSTATE_STENCILPASS render states. Here's the definition of D3DSTENCILOP:
Let's walk through some code that uses the stencil buffer while rendering a scene. This code is
from a sample that shows how to draw shadows. For now, don't worry about how all this code
generates shadows-the algorithm is described later in the chapter.
The shadow-rendering code starts out by disabling the depth buffer and enabling the stencil
buffer:
//-------------------------------------------------------------------
// Name: RenderShadow
// Desc:
//-------------------------------------------------------------------
HRESULT CMyD3DApplication::RenderShadow()
{
// Turn off depth buffer and turn on stencil buffer.
m_pd3dDevice->SetRenderState(D3DRENDERSTATE_ZWRITEENABLE,
FALSE );
m_pd3dDevice->SetRenderState(D3DRENDERSTATE_STENCILENABLE,
TRUE );
Next the code sets the comparison function that performs the stencil test by calling the
IDirect3DDevice7::SetRenderState method and setting the first parameter to
D3DRENDERSTATE_STENCILFUNC. The second parameter is set to a member of the
D3DCMPFUNC enumerated type. In this code, we want to update the stencil buffer everywhere
a primitive is rendered, so we use D3DCMP_ALWAYS:
//
// Set up stencil comparison function, reference value, and masks.
// Stencil test passes if ((ref & mask) cmpfn (stencil & mask))
// is true.
//
m_pd3dDevice->SetRenderState(D3DRENDERSTATE_STENCILFUNC,
D3DCMP_ALWAYS );
In this sample, we don't want the stencil buffer to change if either the stencil buffer test or the
depth buffer test fails, so we set the appropriate states to D3DSTENCILOP_KEEP:
m_pd3dDevice->SetRenderState(D3DRENDERSTATE_STENCILZFAIL,
D3DSTENCILOP_KEEP );
m_pd3dDevice->SetRenderState(D3DRENDERSTATE_STENCILFAIL,
D3DSTENCILOP_KEEP );
The settings in listing 3 are different depending on whether a 1-bit or a multibit stencil buffer is
present. If the stencil buffer has only 1 bit, the value 1 is stored in the stencil buffer whenever
the stencil test passes. Otherwise, an increment operation (either D3DSTENCILOP_INCR or
D3DSTENCILOP_INCRSAT) is applied if the stencil test passes. At this point, the stencil state is
configured and the code is ready to render some primitives.
Creating Effects
Now that you've seen how to create stencil buffers and configure how they work, let's look at
some of the effects you can render with them. The following sections describe several ways
Microsoft recommends using stencil buffers. Each of these approaches produces impressive
results, but a few of them have drawbacks.
Composites
You can use stencil buffers for compositing 2D or 3D images onto a 3D scene. By using a mask
in the stencil buffer to occlude a portion of the render-target surface, you can write stored 2D
information (such as text or bitmaps). You can also render 3D primitives -- or for that matter a
complete scene -- to the area of the render-target surface that you specify in a stencil mask.
Developers often use this effect to composite several scenes in simulations and games. Many
driving games feature a rear view mirror that displays the scene behind the driver. You can
composite this second 3D scene with the driver's view forward by using a stencil to block the
portion to which you want the mirror image rendered. You can also use composites to create 2D
"cockpits" for vehicle simulations by combining a 2D, bitmapped image of the cockpit with the
final, rendered 3D scene.
Decals
You can use decals to control which pixels form a primitive image you draw to a render-target
surface. When you apply a texture to an object (for example, applying scratch marks to a
floor), you need the texture (the scratch marks) to appear immediately on top of the object
(the floor). Because the z values of the scratch marks and the floor are equal, the depth buffer
might not yield consistent results, meaning that some pixels in the back primitive might be
rendered on top of those in the front primitive. This overlap, which is commonly known as z-
fighting or flimmering, can cause the final image to shimmer as you animate from one frame to
the next.
You can prevent flimmering by using a stencil to mask the section of the back primitive on
which you want the decal to appear. You can then turn off z-buffering and render the image of
the front primitive into the masked area of the render-target surface.
Dissolves
You can use dissolves to gradually replace an image by displaying a series of frames that
transition from one image to another. In Chapter 8, you saw how to use multiple-texture
blending to create this effect by gradually blending two textures together. Stencil buffers allow
you to produce similar dissolves, except that a stencil-based dissolve looks more pixelated than
a multiple-texture blending one. However, stencil buffers let you use texture-blending
capabilities for other effects while performing a dissolve. This capability enables you to
efficiently produce more complex effects than you could by using texture blending alone.
A stencil buffer can perform a dissolve by controlling which pixels you draw from two different
images to the render-target surface. You can perform a dissolve by defining a base stencil mask
for the first frame and altering it incrementally or by defining a series of stencil masks and
copying them into the stencil buffer on successive frames.
To start a dissolve, set the stencil function and stencil mask so that most of the pixels from the
starting image pass the stencil test and most of the ending image's pixels fail. For each
subsequent frame, update the stencil mask to allow fewer pixels in the starting image to pass
the test and more pixels in the ending image to pass. By controlling the stencil mask, you can
create a variety of dissolve effects.
Although this approach can produce some fantastic effects, it can be a bit slow on some
systems. You should test the performance on your target systems to verify that this approach
works efficiently for your application.
Fades
You can fade in or out using a form of dissolving. To perform this effect, use any dissolve
pattern you want. To fade in, use a stencil buffer to dissolve from a black or white image to a
rendered 3D scene. To fade out, start with a rendered 3D scene and dissolve to black or white.
As with dissolves, you should check the performance of fades on the target systems to verify
that their speed and appearance is acceptable.
Outlines
You can apply a stencil mask to a primitive that's the same shape but slightly smaller than the
primitive. The resulting image will contain only the primitive's outline. You can then fill this
stencil-masked area of the primitive with a color or set of colors to produce an outline around
the image.
Silhouettes
When you set the stencil mask to the same size and shape as the primitive you're rendering,
Direct3D produces a final image containing a "black hole" where the primitive should be. By
coloring this hole, you can produce a silhouette of the primitive.
Swipes
A swipe makes an image appear as though it's sliding into the scene over another image. You
can use stencil masks to disable the writing of pixels from the starting image and enable the
writing of pixels from the ending image. To perform a swipe, you can define a series of stencil
masks that Direct3D will load into the stencil buffer in a succession of frames, or you can
change the starting stencil mask for a series of successive frames. Both methods cause the final
image to look as though it's gradually sliding on top of the starting image from right to left, left
to right, top to bottom, and so on.
To handle a swipe, remember to read the pixels from the ending image in the reverse order in
which you're performing the swipe. For example, if you're performing a swipe from left to right,
you need to read pixels from the ending image from right to left. As with dissolves, this effect
can render somewhat slowly. Therefore, you should test its performance on your target
systems.
Shadows
Shadow volumes, which allow an arbitrarily shaped object to cast a shadow onto another
arbitrarily shaped object, can produce some incredibly realistic effects. To create shadows with
stencil buffers, take an object you want to cast a shadow. Using this object and the light
source, build a set of polygonal faces (a shadow volume) to represent the shadow.
You can compute the shadow volume by projecting the vertices of the shadow-casting object
onto a plane that's perpendicular to the direction of light from the light source, finding the 2D
convex hull of the projected vertices (that is, a polygon that "wraps around" all the projected
vertices), and extruding the 2D convex hull in the light direction to form the 3D shadow
volume. The shadow volume must extend far enough so that it covers any objects that will be
shadowed. To simplify computation, you might want the shadow caster to be a convex object.
To render a shadow, you must first render the geometry and then render the shadow volume
without writing to the depth buffer or the color buffer. Use alpha blending to avoid having to
write to the color buffer. Each place that the shadow volume appears will be marked in the
stencil buffer. You can then reverse the cull and render the backfaces of the shadow volume,
unmarking all the pixels that are covered in the stencil buffer. All these pixels will have passed
the z-test, so they'll be visible behind the shadow volume. Therefore, they won't be in shadow.
The pixels that are still marked are the ones lying inside the front and back boundaries of the
shadow volume-these pixels will be in shadow. You can blend these pixels with a large black
rectangle that covers the viewport to generate the shadow.
The sample application provides these features in its Shadow Modes menu:
z Show Shadow Volumes: Draws the shadow volumes used to compute the shadows
rather than drawing the shadows themselves.
z Draw Shadow Volume Caps: When you turn this item off, some "extra" shadows might
become visible where the far caps of the cylindrical shadow volumes happen to be visible.
z 1-Bit Stencil Buffer Mode: Tells the code to use a different algorithm that uses only 1
bit of stencil buffer, which won't allow overlapping shadows. If the device supports only 1-
bit stencils, you'll be forced to use this mode.
z Z-Order Shadow Vols in 1-Bit Stencil Buffer Mode: The shadow volumes must be
rendered front to back, which means that if you don't check this option, rendering might
be incorrect.
Figure 12-1, Figure 12-2, and Figure 12-3 show three views of the scene generated by the
ShadowVol2 sample application. You can see the shadows in Figures 12-1 and 12-3; Figure 12-
2 illustrates the shadow volumes.
<<"F12xi01.eps">>
Figure 12-1.
Shadow cast
<<"F12xi02.eps">>
Figure 12-2.
Shadow volumes
<<"F12xi03.eps">>
Figure 12-3. Another view
of the rendered shadows
In this chapter, we didn't add any new code to the RoadRage project. To see these effects in
action, refer to the ShadowVol and ShadowVol2 demo projects included in the DirectX samples.
Conclusion
In this chapter, you learned about stencil buffers and the exciting effects they can produce. In
today's market, making your code stand out is a requisite if you want it to sell your applications
and keep your users coming back for more. Incorporating strategic stencil-buffer effects into
the introduction and into the body of a 3D real-time game might help you win over even the
most discriminating game players.
In Chapter 13, we'll discuss how to load and animate 3D models. Creating animated, lifelike
characters that your users can interact with is one of the most powerful capabilities you can add
to any game.
HRESULT CMyD3DApplication::CreateStencilBuffer()
{
g_bCanOnlyDoOneBitStencil=FALSE;
m_pddsRenderTarget->DeleteAttachedSurface( 0,NULL );
if( m_pDeviceInfo->bHardware )
ddsd.ddsCaps.dwCaps |= DDSCAPS_VIDEOMEMORY;
else
ddsd.ddsCaps.dwCaps |= DDSCAPS_SYSTEMMEMORY;
assert(ddsd.ddpfPixelFormat.dwStencilBitDepth!=0);
g_bCanOnlyDoOneBitStencil=g_bCanOnlyDoOneBitStencil ||
((1<<ddsd.ddpfPixelFormat.dwStencilBitDepth)
<NUM_SHADOWS);
g_dwMaxStencilValue=
(1<<ddsd.ddpfPixelFormat.dwStencilBitDepth)-1;
SetMenuStates();
if( FAILED(m_pddsRenderTarget->AddAttachedSurface(
m_pddsDepthBuffer ) ) )
return E_FAIL;
//------------------------------------------------------
// Name: EnumZBufferFormatsCallback
// Desc: Enumeration function to report valid pixel
// formats for z-buffers
//------------------------------------------------------
static
HRESULT WINAPI EnumZBufferFormatsCallback(
DDPIXELFORMAT* pddpf,
VOID* pddpfDesired )
{
if( NULL==pddpf || NULL==pddpfDesired )
return D3DENUMRET_CANCEL;
return D3DENUMRET_OK;
}
if(g_bUseOneBitStencil)
{
pd3dDevice->SetRenderState
(D3DRENDERSTATE_STENCILREF,
0x1 );
pd3dDevice->SetRenderState
(D3DRENDERSTATE_STENCILMASK,
0x1 );
pd3dDevice->SetRenderState
(D3DRENDERSTATE_STENCILWRITEMASK,
0x1 );
pd3dDevice->SetRenderState
(D3DRENDERSTATE_STENCILPASS,
D3DSTENCILOP_REPLACE );
}
else
{
pd3dDevice->SetRenderState
(D3DRENDERSTATE_STENCILREF,
0x1 );
pd3dDevice->SetRenderState
(D3DRENDERSTATE_STENCILMASK,
0xffffffff );
pd3dDevice->SetRenderState
(D3DRENDERSTATE_STENCILWRITEMASK,
0xffffffff );
pd3dDevice->SetRenderState
(D3DRENDERSTATE_STENCILPASS,
g_StencIncOp );
}
Peter Kovach has been involved in computer software and hardware development
since the mid-1970s. After 11 years in various levels of development and project
management, he was eager to being pushing the envelope in 3D virtual world
development. He currently words at Medtronic, where he is the project lead
developming programmable, implantable medical devices that use a next-generation
graphical user interface.
The dog days of summer are here again, and that can only mean one thing: it's Siggraph time.
The biggest graphics show on Earth took place last week in sunny New Orleans, to the great
delight of 26,000 attendees. This year's event focused mainly on the new platforms for
entertainment. Overall, the show was really huge: it lasted one full week, it offered over 40
courses, and there were more than 300 exhibitors in about 150,000 square feet. Thus, I'll
outline the major hardware and software highlights of the show, and offer some insight on
trends.
In the home entertainment arena, Sony's vision improves upon the classic TV concept by
adding Internet access, broadband connectivity, and unprecedented digital image quality.
Although we saw a large flat screen showing some impressive movies, both the console and
home entertainment areas were nothing shocking for most of the attendees. What was shocking
and unexpected sat in between these two areas: a black cube about the size of a microwave
oven, with "GScube" printed on it. It was the living proof of Sony's plans to blend passive and
interactive entertainment worlds together in the future.
The GScube is a rendering device targeted at the content creation/broadband delivery markets.
It consists of 16 cascaded processing units, each of them based upon an enhanced version of a
Sony Playstation 2. Every processing unit has an Emotion Engine CPU (which was jointly
developed by Sony and Toshiba), and an improved Graphics Synthesizer equipped with a 32MB
frame buffer (eight times the memory of a regular PS2). This yields a total 512MB of VRAM, and
it can theoretically reach a peak performance of 1.2 billion triangles per second -- a number
that sounds like it comes from science fiction books. The device must be controlled by an
external broadband server which feeds data to the GScube, and at Siggraph that device was the
brand-new SGI Origin 3400. At the Sony booth, we enjoyed a real-time battle between
characters from the movie Antz rendered in real time, as well as interactive sequences from the
upcoming Final Fantasy movie shown at 1920x1080 pixels and a sustained rate of 60FPS.
In the Antz demo, I counted 140 ants, each comprising about 7,000 polygons, which were
rendered using a ported version of Criterion's Renderware 3. All ants were texture mapped, and
the results looked surprisingly close to the quality of the original movie. The Final Fantasy demo
was just data from the now-in-development full-length CG movie based upon the game series,
rendered in real time by the GScube. It showed a girl (with animated hair threads) in a zero-
gravity spaceship, with a user-controllable camera viewpoint. The demo rendered about
314,000 polygons per frame, and included an impressive character with 161 joints, motion-
blurring effects, and many other cinematic feats. According to Kazuyuki Hashimoto, senior vice
president and CTO of Square USA, the GScube allowed them to show real-time quality, in "close
to what is traditionally software rendered in about five hours." Sony believes that the GScube
will deliver a tenfold improvement over a regular PS2, and future iterations of the architecture
expect to reach a 100-fold improvement.
Xbox Demos. The second surprise at Siggraph was seeing what seemed to be a fully-
operational Xbox at the Nvidia booth. The device, still in its fancy silver X-shape, was plugged
into a large screen and showed the same demos that we saw at the GDC and E3 (the girl
dancing with her robot, and butterflies flying above the pool). Quality seemed a bit lower than
the original GDC demos, but the animations still looked gorgeous. Whether the device was a
real Xbox or just a very early prototype is unknown, but having more than a year of
development time ahead, it seems premature to claim what we saw as the definitive
architecture. We have to wait to see how the system evolves.
Improv Technologies. On the software side, Siggraph had two nice surprises in store for me.
First, there was a new company on the expo floor called Improv Technologies, which is a spin-
off from New York University's Center for Advanced Technology. The company is headed by Ken
Perlin, who is also the guy behind the well-known noise function that bears his name. He was
awarded the scientific achievement Academy Award in 1997 for the development of this
function, as it's been widely used in many movies.
The goal of Improv Technologies is to create and deliver products based upon research that's
been going on at NYU over the last two decades -- research that is closely related to game
development. NYU research has traditionally been focused on the production of procedural
character animation, using turbulence and noise functions as control systems. Their work
tackles both the high and low levels of this problem. At the low level, it controls small character
movements (such as frowning and smiling) in a realistic and convincing way. You can
experience a very nice hands-on Java demo at http://mrl.nyu.edu/perlin/facedemo/. At the
high level, the folks at Improv have also explored layered animation (which was the goal of the
original Improv system), which allows improvisational actors to interact and behave in real time
in complex, networked environments. You can find additional information on those research
subjects in the bibliography at the end of this article.
Improv Technologies demos Catalyst.
At Siggraph, Improv premiered their first two products: Orchestrate3D and Catalyst.
Orchestrate3D is a project management tool for animation. It's core module is the Scene
Conductor, an animation compositor. Using the same paradigm found in the original Improv
system from NYU, animators can assign motion sequences to body parts, blend different
motions in different areas of the body, layer motions, sequence animations, and so on.
Catalyst, on the other hand, is a generic game engine with emphasis in the graphics subsystem.
It includes an advanced character animation engine, plus a level engine. The character
animation engine is designed to work with an animation package such as Maya or 3D Studio
Max, and supports low (e.g. facial expressions) and high (e.g., full-body layered control) levels
of detail. The level engine supports a number of advanced features like curved geometry, real
time shadows, cartoon rendering and collision detection. Although neither product is available
yet, the demos and talks at Siggraph make me believe that it would be wise to keep an eye on
Improv, as their company is capable of creating some spectacular game development tools.
SGI's Marc Olano. The second software highlight doesn't come from the expo floor, but from a
technical paper presented by Marc Olano and others from SGI. The technique introduced is
based on a simple idea: to achieve Renderman-quality shading power using standard OpenGL
calls. To those of you unfamiliar with CG software, Renderman is a widely accepted rendering
standard that Pixar developed in the late 1980s, which has been used to create blockbuster
movie effects shown in hits like Toy Story, Jurassic Park, and The Abyss. Renderman's main
advantage is a very flexible, C-like shading language, which allows procedural definition of
surface attributes, texturing, lighting, and so on. This language is the Renderman Shading
Language, and its "programs" are called shaders. You can see an example Renderman shader
(the ubiquitous marble texture) below:
#include "marble.h"
surface marble()
{
varying color a;
uniform string tx;
uniform float x; x = 1/2;
tx = "noisebw.tx";
FB = texture(tx,scale(x,x,x));
repeat(3)
{
x = x*.5;
FB *= .5;
FB += texture(tx,scale(x,x,x));
}
FB = lookup(FB,tab);
a = FB;
FB = diffuse;
FB *= a; FB += environment("env");
}
Renderman shaders are compiled into byte code, similar to the way Java operates, and they are
executed during the rendering process. Although they deliver very high quality results, they are
not well suited for real-time applications like games. Some games (the most noticeable example
being Quake 3) have tried to replicate this "shader power" in real time, but results have never
been close to what's possible with Renderman. What Marc Olano showed at Siggraph is a
prototype technology called Interactive Shading Language. This shading language is roughly
equivalent to Renderman in features and power. The great thing about it is that SGI has
developed a technique to convert Renderman-style code into OpenGL calls automatically, so the
CG shader becomes usable in a real-time environment. Theatrically, one could take a shader
from movie production, plug it into SGI's shader compiler, and get pure, optimized OpenGL
code.
In a sense, the system turns shader instructions into rendering passes into OpenGL's pipeline.
Thus, using multitexturing on today's hardware, you can achieve similar results to those
created by Renderman. Below, you can see a top-down comparison of the same scene rendered
using Pixar's Photorealistic Renderman (top) and OpenGL multi-pass rendering with a shader
compiler (bottom). The benefit of this new technique is clear: a company will be able to use the
same technology in the production/CG area and the games department. And the fact that
today's game hardware implements more and more Renderman-style functions (Nvidia's per-
pixel lighting and shading are a good example) certainly helps. So someday soon we may see
huge gains in visual quality by porting shaders to OpenGL, without incurring a significant
performance hit.
Siggraph just focused on hardware and software. It provided a wide variety of food for thought.
With vast amounts of new technology available, it is impossible not to consider the long-term
trends that will affect the games and graphics industry.
I think back to the early 1980s, when someone from the REYES team (an early CG application
used to render scenes in Star Trek II -- that product later evolved into Renderman) said that
"reality is just 80 million triangles per second." Today, it would probably be more accurate to
say "reality is 80 million triangles per frame," as the original prediction was rather conservative.
Still, there's something striking about that sentence. Way back in those pioneer days, 80M
triangle per second was a huge number, and anyone capable of doing CG imagery of such
quality would surely achieve lifelike results. Well, years later, people from Nvidia and Sony
(among others) now offer that performance levelÉin real time. Clearly, the terms "real-time"
and "rendered" have become almost synonymous.
The game development industry has not stood still, either. When that 80-million triangle goal
was stated, games featured four to 16 colors, and fit within 64KB. Today, big-game budgets are
into the millions of dollars, and development cycles are similar to those found in the movie
industry. The quality of the results has also increased accordingly, and today's games are more
related to movies (in terms of quality) than to old-style games. While some people think there's
no relationship between the industries (other than some resource sharing), many believe that
the different areas of electronic entertainment are converging towards a unique central vision,
and to me, Siggraph 2000 provided ample proof.
For example, look at the GScube. The device offers a nice view of what the future of
entertainment could offer. Two years from now, broadband will probably be the delivery
mechanism of choice for home entertainment. GScubes (or whatever they will be called then)
will likely offer rendering power on the order of the billion triangles per second. But, as Sony
says, the GScube will not be a "console" platform -- it will be a "content delivery" platform.
Thus, convent providers (be they cable carriers, telcos, movie studios or game companies) will
use that kind of devices to feed us entertainment through broadband pipes, and the GScube will
be the game server, and perhaps the Playstation 2 will become a game client.
How about e-cinema? With this level of rendering power, one could think of the new devices as
real-time interactive cinema renderers. Some will say "hey, movie theaters are a group
experience, so interactivity is useless." Yes, but how about interactive home cinema? Imagine
sitting at home and watching a rendered movie in which you can make decisions.
If you want a collective interactive experience, recall what happened at the Electronic Theater,
in Siggraph '91 (in Las Vegas) and at Siggraph '98 (in Orlando). At these events Loren
Carpenter (another big name in the Renderman world) did a terrific experiment involving
collective reasoning. Each member of the audience (all five thousand people) was given a
wooden card (it looked like a paint stirrer) which had a red reflector on one side, and green
reflector on the other. A camera was aimed at the audience from the stage, which sensed the
balance of power between the two colors as the audience "voted." Then a giant game of Pong
was displayed on the theater's screen on a separate screen. As more green or red reflectors
were sensed by the camera, the paddle went up or down. Not surprisingly, this "collective
brain" happened to be quite proficient at playing games such as Pong and while piloting a plane
in a flight simulator (well, the plane crashed, but everyone had a blast anyway). Apart from
being quite an interesting example of flocking behavior (simple local rules that yield emergent
complex behavior), this experience is a mind-opener regarding collective entertainment. Now
come back again and try to imagine mixing that with existing technology. Do you still think
interactive, collective movies are out of the way? I don't.
In a different context, let's revisit the SGI talk. If Renderman code can be ported into OpenGL
calls easily, and that kind of approach is widely accepted, movies and games will be able to
share not only similar quality and production values, but also many core development
techniques. If a movie production company builds an array of shaders for a blockbuster film,
those resources could be instantaneously and seamlessly integrated in the gaming system,
shortening production cycles and, more importantly, making the game engine and the movie
look more similar.
The No-Shows
Siggraph 2000 was an interesting show, but there were some companies that were noticeably
absent. Whatever the reason, seeing big players "vanish" from one of the world's main
multimedia exhibits is regrettable. 3dfx, for example, had very good reasons to be there: the
Voodoo 4 and 5 product lines, especially the high-end versions, are quite impressive products.
No one doubts today that the company's market share has diminished thanks to nVidia, but
3dfx still holds the crown as many gamers' favorite brand.
Another company that was missing was Microsoft. Microsoft also "skipped" Siggraph '99, which
is unfortunate, because many people were expecting to see the folks from Redmond showcase
the multimedia abilities of Windows 2000 and the Xbox. With DirectX 8 peeking over the
horizon, Siggraph would have been the perfect place to hear about the new tools and
technologies straight from Microsoft. At least Microsoft offered us (via Nvidia's booth) a glimpse
of the Xbox.
Finally, there was the weird stuff at Siggraph. Being a multi-discipline show made up of
technical conferences and art exhibits, Siggraph is a perfect opportunity to see demos and
technologies way ahead of their time. This year, two exhibits share my prize for "weirdest stuff
of the show."
First, I must honor the guys from Sun Microsystems for their impressive booth. Apart from
displaying everything related to Java and Solaris under the sun (no pun intended), they had a
spectacular "surprise guest": an eight-foot-long animatronic dolphin swimming in a water tank.
The dolphin greeted the visitors, did funny noises, and was a very effective way of grabbing
people's attention. The level of detail made many believe that this was a real creature.
Apparently the creature has something to do with Sun's future marketing strategy, so I guess
we'll discover that later this summer.
Second, we must honor Daniel Rozin for his "Wooden Mirror" installed at the Art Gallery, which
was a work of true genius. The installation consisted of a mirror frame, and instead of a piece of
reflective glass, it used an array of 830 small, axis-aligned wooden chips individually controlled
by mechanical servos. In the middle of the array, a miniature camera sensed whatever stood
close to the mirror, and the hundreds of motors rotated the wooden pieces so that they became
(more or less) reflective, a reflected image -- just as if it were a mirror. Because of all the
movement going on to build the reflections, the wooden mirror did exhibit some noise effects,
resembling waves at the sea. But the display was visually impressive, and I don't think a
written paragraph can do justice to it. So, check out the wooden mirror in action below.
References
Improv Technologies
http://www.improv-tech.com/
Perlin, K., and A. Golberg. "Improv: A System for Scripting Interactive Actors in Virtual Worlds."
Computer Graphics: Vol. 29 No. 3, available online at http://mrl.nyu.edu/improv/sig96-paper/
Upstill, Steve. The RenderMan Companion. Reading, Mass.: Addison-Wesley, 1992, (ISBN: 0-
201-50868-0)
I had planned to begin this article by sharing my own true experiences with online cheating as it
pertained to a particular game. But I think the long version of my story would cast an
unnecessarily negative light on the game and the company that made it. And since the
developers are good friends of ours, I'll stick to the short version that goes like this.
Last year I became hooked on a certain first-person shooter (FPS) game. After a couple months
of addictive online gaming, I became convinced that some players were cheating and things
suddenly changed that day. I was ready to walk away from the game in disgust and tell
everyone else to do the same. Instead, I decided it was time to learn what I could about the
alleged cheaters, their motivations, and most importantly their methods. In my case, I
discovered at least three distinctly different methods of cheating that could explain what I
experienced -- though as just a player I could not prove conclusively which methods, if any,
were being used against me.
The aim of this article is to bring the subject of online/multiplayer cheating out of the shadows
and talk about it in terms of real problems with real games and to help build a framework for
classifying and understanding the various details. I will cover some of the ways that players are
able to cheat at various games; at times I will go into the working details, ways to prevent
those cheats, and limitations of various game architectures as they relate to multiplayer
cheating. This is by no means a comprehensive and exhaustive tome on the issue, but it is a
start. There is a serious lack of information on this subject, and paranoia among developers
that talking about it will reveal secrets that will only make the problem significantly worse.
Several individuals at various companies declined to talk to me about cheating and their games
for this and other similar reasons. I respect that, but I think developers have everything to gain
by sharing our knowledge about cheaters and how to combat them.
Just how seriously should you as a developer take the possibility of online cheating? If your
game is single-player only, then you have nothing to worry about. But if your game is
multiplayer only, the success of your entire product is at stake. If your game does both, you're
somewhere in the middle. As more games are released with online play as an integral
component, drawing ever-larger audiences (and the corollary development of online
communities and sites based around the game), it becomes ever more important to insure that
each online game player experiences what they believe to be a fair and honest experience. I'm
reminded of a quote from Greg Costikyan's excellent report, "The Future of Online
Gaming" (http://www.costik.com/): "An online game's success or failure is largely determined
by how the players are treated. In other words, the customer experience -- in this case, the
player experience -- is the key driver of online success." Our short version is, "Cheating
undermines success."
Consider the well-known case of Blizzard's Diablo -- deservedly a runaway best-seller and great
game that acquired a significant reputation for a horrible multiplayer experience because of
cheaters. Many people I know either refused to play it online, or would only play over a LAN
with trusted friends. Blizzard did their best to respond, patching it multiple times, but they were
fighting an uphill battle.
Cheating hit closer to home for me while I was working on the final stages of Age of Empires II:
The Age of Kings. Cheating online became a widespread problem with the original Age of
Empires. Tournaments had to be cancelled due to a lack of credibility, the number of online
players fell, and the reputation of my company took a direct hit from frustrated users. Unable to
spare the resources to fix the game properly until after Age of Kings was done, we just had to
endure our users turning their anger upon us -- probably the most personally painful thing I've
experienced as a developer.
What about your next game? This is a good time to introduce my first two rules about online
cheating:
Rule #1: If you build it, they will come -- to hack and cheat.
Rule #2: hacking attempts increase with the success of your game.
Need more reasons to take online cheating seriously? Go onto eBay and type in the name of
your favorite massively multiplayer game. Now look at the real money changing hands for
virtual characters and items. What if those items being sold were obtained via some sort of
cheat or hack? Let's not overlook the growth of tournaments and contests for online games.
Consider the public relations nightmare that would ensue if the winner of a cash prize in a
tournament had cheated. Enough to give you a headache, eh?
The sad truth is that the Internet is full of people that love to ruin the online experiences of
others. They get off on it. A great many cheaters use hacks, trainers, bots, and whatnot in
order to win games. But while some openly try to wreak havoc, many really want to dominate
and crush opponents, trying to make other players think they are gods at the game -- not the
cheaters they are. The only thing that seems to bother them is getting caught. Beyond that, no
ethical dilemmas seem to concern them. The anonymity and artificiality of the Internet seems
to encourage a moral vacuum where otherwise nice people often behave in the worst possible
way. A big factor in this is a lack of consequences. If a player is caught, so what? Are they fined
or punished? No. Are they rejected by the people they played against? Usually, but it's so easy
to establish another identity and return to play that discovery and banishment are no barrier to
those with ill intent.
Another interesting aspect of online cheating is the rise of clans and how cheats get
propagated. If a member of a clan hacks a game or obtains a not-readily-available program for
cheating, it will often be given to other members of the clan with the understanding that it's for
clan use only and to be kept secret. The purpose being, of course, to raise the standing and
prestige of the clan. If the cheater is not a clan member, odds are he will keep the secret to
himself for a while and not advertise his advantage. The logic here is simple: If anyone goes
public with a cheat, a) he will lose his advantage, b) he will probably be identified by his
opponents as a cheater, and c) the developer can then patch the game, invalidating the cheat.
As a result of this secretive behavior we get to rule number three.
Rule #3: Cheaters actively try to keep developers from learning their cheats.
So how do they discover the hacks and create the programs to cheat at your game? Consider
rule number four:
Rule #4: Your game, along with everything on the cheater's computer, is not secure. The files
are not secure. Memory is not secure. Services and drivers are not secure.
That's right, you gave them a copy of your game when they purchased it. The hackers have
access to the same tools that you had while making the game. They have the compilers,
dissemblers, debuggers, and utilities that you have, and a few that you don't. And they are
smart people - they are probably more familiar with the Assembly output of an optimized C++
file than you are. The most popular tool among the hackers I surveyed was NuMega's excellent
debugger, SoftIce - definitely not a tool for the wimpy. On another day, you just might be
trying to hire these people. Many of them possess a true hacker ethic, doing it just to prove it
can be done, but more do it specifically to cheat. Either way we get the same result: a
compromised game and an advantage to the cheater.
Hacking games is nothing new, it's been going on as long there have been computer games. For
single-player games, it has never been an issue, since no matter what a player does with a
game, he's only doing it to himself (and therefore must be happy about it). What's new is
bringing the results of the hacking to other players, who never wanted or asked for it.
I've lost count of the number of developers I've encountered who thought that because
something they designed was complicated and nobody else had the documentation, it was
secure from prying eyes and hands. This is not true, as I learned the hard way. If you are
skeptical, I invite you to look at the custom graphics file format used in Age of Empires. Last
year, I received a demanding e-mail from a kid who wanted the file format for a utility he was
writing. I told him to go away. Three days later he sent me the file format documentation that
he reverse-engineered, and asked if he missed anything. He hadn't. Thus, this is a perfect
example of rule number five. Yes, I've borrowed it from cryptography, but it applies equally well
here.
Sometimes we do things, such as leaving debug information in the game's executable, that
make the hacker's job easier. In the end, we cannot prevent most cheating. But we can make it
tough. We don't want effective cheating to be a matter of just patching six bytes in a file.
Ideally we want hacking a game to be so much work that it approaches the level of having to
completely rewrite the game -- something that goes outside the realm of any reasonableness
on the hacker's part.
One of biggest things we often do that makes it easier for a hacker, and thus harder on us, is
include Easter eggs and cheat codes in the single-player portion of our games. Considered to be
practically a requirement, they expose extralegal capabilities of our game engines and make it
much easier for the hackers to locate the data and code that controls that functionality.
Most online games use one of two communication models: client-server and peer-to-peer. For
our discussion, the deciding factor is where game event decisions are made. If only one player's
(or a separate) computer makes game event decisions or has the game simulation data, it is
client-server. If all players' computers make some or all of the game event decisions, or have
the full game simulation, then it's peer-to-peer. Many of the cheating methods described here
are applicable to both models. I've organized the various cheats, trainers, exploits, and hacks
that I've learned about into the categories listed in Table 1.
Compromised The only hitch is that the proxy monitors and attempts to decode all of the
servers packets it is routing. The program keeps track of the movements and
locations of all the players the server is reporting to the game, building a
Bugs and design simple model. When the proxy sees a Fire Weapon command packet issued
loopholes by the cheating player, it checks the locations and directions of all the
players it is currently tracking and picks a target from them. It then inserts
Environmental a Move/Rotate command packet into the stream going to the server in
weaknesses front of (or into) the Fire Weapon command packet that points the player
straight at the selected target. And there you have it: perfect aim without
all the mouse twisting.
When aiming proxies for Quake first appeared a couple of years ago, their targeting wasn't too
sophisticated and didn't take into account things such as the player's field-of-view (FOV) or lag.
Giveaways, such as players shooting weapons out of their backs, tipped people off that
something foul was afoot. One of the first countermeasures to be developed was a server add-
on that statistically identified players whose aim was too good to be true, then kicked out and
banned the perpetrators. This naturally proved controversial, since some people really are
"railgun gods," and the issue of possibly falsely identifying a person as a cheater was raised
(and has yet to go away). And of course, the aiming proxies evolved with time. Later versions
were improved to consider only the player's current FOV and compensate for lag, and added
just enough randomness in their aim to stay below a server's "too good to be legit"
identification threshold.
This big vulnerability is summed up in rule number six. Since the proxy is not running on the
same computer as the game client, definitive detection can be next to impossible. Making the
development of the proxy extremely difficult then becomes a priority.
Rule #6: Any communication over an open line is vulnerable to interception, analysis, and
modification.
One way to inhibit this form of cheating is to encrypt the command packets so that the proxies
can't decode them. But there are limits to the extent that encryption can be used on
communications. Most FPS games can send and receive a couple of kilobytes of data or more
per player per second, and have to allow for lost and out-of-order packets. The encryption
therefore has to be fast enough not to impact frame rate, and a given packet's encryption can
not be dependent on any other packet unless guaranteed delivery is used. And once the
encryption is cracked, the game is vulnerable until the encryption is revised, which usually
involves issuing a patch. Then the hacking starts over.
Another way to make life more difficult for the proxy creator is to make the command syntax
dynamic. Using something as simple as a seed number that's given to the game when it
connects and a custom random number function, the actual opcodes used in the communication
packets can be changed from game to game, or even more often. The seed itself doesn't have
to be transmitted; it could be derived from some aspect of the current game itself. The idea
here is that since a proxy sees all the communications, but only the communications, the
random seed is derived from something not explicitly communicated. Foolproof? No. But it's far
more difficult to hack, forcing the hackers to start from scratch.
Though reflex augmentation seems to be exclusive to FPS games, the vulnerability extends to
any game where quick reflexes can make a difference and game communications can be
sniffed.
The next major class of cheats is exploiting authoritative clients. This is when one player's
modified copy of an online game tells all the other players that a definitive game event has
occurred. Examples of the communications would be "player 1 hit player 2 with the death-look
spell for 200 points of damage," "player 2 has 10,000 hit points," and so on. The other players'
games accept these as fact without challenging them and update their copy of the game
simulation accordingly.
In this case, a hacked client can be created in many ways: The executables can be patched to
behave differently, the game data files can be modified to change the game properties on the
hacked client, or the network communication packets can be compromised. In any case, the
result is the same - the game sends modified commands to the other players who blindly accept
them. Games are especially vulnerable to this type of exploit when they are based on a single-
player game engine that has been extended to support online multiplay in the most direct
(read: quickest to develop) manner.
Fortunately there are several steps that a game developer can take to eliminate most problems
with authoritative clients. A first step is to install a mechanism in the game that verifies that
each player is using the same program and data files. This means going out and computing a
CRC or similar identifier for all the data in question, not just relying on a value stored in the file
or the file size. A nice side benefit is that this method also detects out-of-date files during the
development process.
For peer-to-peer games, cheating can be made difficult by changing from a game engine that
issues commands to one that issues command requests. It's a subtle distinction but one that
requires engineering changes throughout the game. It also requires that each player's machine
run a full copy of the game simulation, operating in lockstep with the other players.
Command processing in a single-player game typically works in the manner shown in Figure 1.
The player issues some sort of command via the game's user interface. The game then
performs a validation check on the command to see if the player has the resource, the move is
legal, and so on. The game then performs the command and updates its internal game
simulation. Figure 2 shows game engine command processing extended to support multiple
players in the most direct way possible. The process stays the same except for the addition of a
communications packet that's sent out to inform the other players of what has taken place. The
receiving players integrate the data directly into their world simulation.
Figure 1. Single-player-game command
processing steps.
With the shift to command requests, the order of events changes a bit, which is shown in Figure
3. After determining that the command is a legal one, a command request describing the
command is sent out to other players and is also placed into the player's own internal command
queue, which contains command requests from other players as well as his own requests. Then
the game engine pulls command requests from the queue and performs another validation
check, rejecting the request if it fails. The fundamental difference is that every player has a
chance to reject every action in the game based solely on the information on that player's
machine. No other machine provides the information to make the determination on what is right
and wrong. A hacked game cannot reach out and alter what's on an honest player's machine
with this approach. Note that such an architecture works equally well for a single-player game.
Figure 3. Command processing steps when using
command requests.
Preventing a dishonest command from being accepted on an honest player's machine is only
half the task. The game also has to be able to determine whether someone is playing the same
game and if not, it must do something about it. For instance, when a received command
request is rejected for reasons that should have prevented it from being issued in the first place
(remember, the issuer is supposed to have checked it for validity before passing it to the other
players), all other players should assume that a cheater is in their midst, and take some sort of
action.
Often though, due to design issues (such as posting command requests to a future turn), it is
not possible to thoroughly ensure that all command requests passed to other players won't be
rejected if a player is being honest. A good way to deal with this is to add synchronization
checking to the game. At various points during the game, each player's machine creates a
status summary of the entire game simulation on that computer. The status, in the form of a
series of flags, CRCs, and checksums, is then sent to all the other players for comparison. All
the status summaries should be the same, provided the game program and data files are the
same for each machine. If it turns out that one player has a different status from all the rest,
the game can take action (like drop the player from the game). The idea is that a hacked game
should cause that player's game simulation to produce different results.
Alternatively, you can make life even more difficult for the hacker by easing up on the received
command request evaluations. By allowing command requests to bypass the verification check
only on the machine that issued it, you're deliberately allowing the game to go out of synch if
the initial verification check or data has been hacked. Combine this with a synchronization
check that occurs somewhat infrequently and you've presented the hacker with something of a
mystery -- on his machine the cheat worked, but then a while later the other players booted
him out of the game.
This status synchronization has a huge benefit for the development process as well. Getting a
complicated game engine to produce the same game simulations results while having different
player views, inputs, and settings is a very difficult task. It's difficult to keep the simulation-
independent code from accidentally impacting the simulation. For example, a compare against
the current player number variable in the simulation code, or randomly playing a background
sound based on an object in the player's view using the same random function used by the
simulation, could cause future executions to produce different results on different machines.
Judicious use of status synchronization allows a developer to quickly narrow down the portion of
the game that isn't executing the same for all players.
Client-server games unfortunately can't benefit as much from these techniques, as they lack full
game information and by design must rely on the authority of the server. We will look at this
more a bit later.
The next major class of cheats is what I've dubbed "information exposure." The principle is
simple: On a compromised client, the player is given access or visibility to hidden information.
The fundamental difference between this and authoritative clients is that information exposure
does not alter communications with the other players. Any commands sent by the cheater are
normal game commands - the difference is that the cheater acts upon superior information.
The first-person-shooter cheats of modified maps and models arguably fall under this
classification, as they let cheating players see things that they normally wouldn't be able to (in
the case of modified maps), or see them more easily (in the case of a modified player model
that glows in the dark). Any game whose game play relies on some information being hidden
from a player has a lot to lose to these types of cheats.
The real-time strategy (RTS) genre suffers severely from this. The most obvious being hacks
that remove the "fog of war" and "unexplored map" areas from the display. With a fully visible
map, the cheating player can watch what other players are planning and head them off at the
pass, so to speak.
There are a couple of ways the hacker accomplishes this. The hacker may go after the variables
that control the display characteristics of the map. With the help of a good debugger and single-
player cheat codes to reveal the whole map, finding the locations in memory that control the
map display is fairly simple. Then either the game .EXE file is modified to initialize those map
control values differently, or a program is made that attaches to the game's memory space and
modifies the variable values while the game is running. To combat this, the values of those
variables should be regularly reported to other players in the form of a checksum or CRC code.
Unfortunately, that only raises the stakes; the hackers then just attack the code that reads
those control values (easy enough to find quickly), inverting or NOP'ing out the instructions that
act upon them.
Additional techniques are needed to detect the hacked game view. There are a couple of ways
to take advantage of the fact that the full game simulation is run on all clients. One way is to
borrow a technique from the "authoritative client" section and check each command request for
the side effects of a hacked map on one of the players. We specifically ask the game simulation,
which is separate from the screen display, the question, "Can that player see the object he just
clicked on?" In doing this we are assuming ahead of time that such hacks will be attempted,
making sure we consider the side effects by which they might be detected. Once again, easing
up on checks of the player's own machine is very useful. The next time the game performs a
synchronization check, all the other players will agree that the cheating client is "out of synch"
with the rest of the game and can deal with him accordingly.
Another technique that avoids looking at the display control variables is to compile abstract
statistics on what gets drawn to the screen. The statistics are derived from the game simulation
data and just filed away. This doesn't immediately prevent the hacker from cheating; instead,
you send the statistics around as part of the status synchronization and see what the other
players think of them.
In the RTS map-hack case, it is necessary for some change to be made to the game; either the
code or some data is in a modified state while the game is running. And if something has been
modified, you can attempt to detect that.
But information exposure cheats can be totally passive. Consider a scenario where a program
gains access to the memory space of an RTS game that is running. It then reads key values for
each player in the game out of memory and sends them to an adjacent networked computer.
An industrious hacker once raised that scenario with me regarding one of the Age of Empires
games, saying he had figured out how to read out of memory the resource amounts for every
player. At first we thought that this wasn't very serious. He then explained that if he polled the
values a couple hundred times a second, he could identify nearly every discrete transaction. A
simple Visual Basic program could then display a log window for each player, with messages for
events such as the training of various units (to the extent they could be distinguished from
others on the basis of cost), and messages for events such as building construction, tribute,
and advancement to the next age. Basically, this cheating method was the next best thing to
looking over the shoulders of his opponents.
Rule #7: There is no such thing as a harmless cheat or exploit. Cheaters are incredibly
inventive at figuring out how to get the most out of any loophole or exploit.
Intrigued, I asked him how he could be sure he had found the correct memory locations each
time, as they changed each game since they were stored in dynamically allocated classes. His
answer was most interesting. He first scanned the memory space of a paused game looking for
known values for things such as population, wood, gold, and other very significant game values
that he knew about and believed were unique. He had a simple custom program that looked for
the values in basic formats such as long ints and floats. After his program identified all the
possible addresses with those values, he ran the game a bit more until the values had changed.
He then reran the program, checking the prior list of locations for the new values, reducing the
list of possible addresses until he was sure he had found the correct locations. He then put a
read-access breakpoint on the value and looked at how it was accessed from various points in
the code. At one of the breakpoints, the C++ code for accessing the wood amount looked
something like this:
This is a pointer to a pointer to an object containing an array of integers, one of which contains
the value of the player's current stockpile of wood, and all the objects are dynamically
allocated. The hacker's point was that if you trace back through all the dynamic pointers, you
eventually find a static variable or base pointer. The different spots where his breakpoints were
triggered were from member functions at different levels in the class hierarchy, and even from
outside the class hierarchy containing the data. And it was finding an instance of that latter
access condition that was the jackpot. There it was in his debugger's disassembly window: a
base address and the Assembly code to traverse through the classes and handle player and
resource index numbers.
Considering all this, I found a couple of strategies that can greatly reduce the likelihood of this
sort of passive attack. Again, these tips cannot guarantee 100 percent security, but they make
the hacker's job much harder.
The first strategy is to encrypt very significant values in memory at all times. Upon
consideration, most game variables are not significant enough to warrant such protection - the
hit points of a particular object don't tell anyone much, while a drop of 1,000 food and 800 gold
from a player's resources does indicate that the player is advancing to the Imperial Age, which
is an event of large strategic importance in our game. Simple encryption is relatively easy when
all access to the variables goes through assessor functions. A communicative function such as
XOR is your friend here, as it alters values upon storing, restores them upon reading, and is
extremely fast. The whole point is to make it very hard for the hacker to find the variables he is
searching for in the first place. Values the hacker would know to look for are not left around so
that a simple scan can find them. In C++, our encrypted assessor functions for game resources
look something like what's shown in Listing 1.
The second strategy for slowing down passive attacks is to never access very significant values
from outside the class hierarchy. Assuming the values are located while using the debugger, try
not to access them in a way that starts with a reliably fixed memory address. Combining this
with small, randomly sized spacing buffer allocations during the main game setup ensures that
the memory addresses for vital information will never be the same from one game to the next.
A piece of C++ code you won't see in our next RTS game would be the following:
Information access isn't limited to games as complex as RTS games, it can extend to something
as simple as a card game. Consider an online card game such as poker. All it would take to ruin
the game is for a player to see the values of the face-down cards in another player's hand. If
the information is on the machine, hackers can go digging for it. This goes back to rule number
four.
In client-server games, because so much is controlled by the server, the game is only as good
as the trust placed in the server and those who run it.
An issue here is brought up because some client-server games can be customized by the user
running the server. Access and configurability are great for many games, as they allow the
player community to extend and evolve a game. But some individuals will test the server to see
what can be exploited in the name of cheating. This in itself is not the problem -- rather it's
when honest but unaware players find their way to the server and don't know that they are not
on a level playing field.
You really need to consider your audience here. A successful game will sell hundreds of
thousands of copies, if not millions. You as a developer will be most in tune with the hard-core
players -- those that know the game inside and out. But it's easy to forget about the more
casual players, who probably will be the majority of purchasers of your game once you pass a
certain level of success. These are the people who don't know to check the status of the
Cheats_Allowed flag before joining a server, or that game rule changes are transparently
downloaded when they connect. All they probably know is the default game configuration, and
when they see their ReallyBFG27K gun doing only 0.5 points of damage, they're going to cry
foul. It doesn't matter that it was technically legal for the server operator to make the change,
you still wind up with a user that is soured on the game and not likely to recommend it to his
buddies anymore.
Naturally, people get a whole lot more unhappy with a game when they encounter modifications
with malicious intent. What if a clan decided to add a tiny server mod to their FPS server that
looked something like this snippet of C code:
Or what if the remote console was hacked to allow normal cheats to be toggled? Dishonest
players in with the server could make a key-bind that resembled this:
The important point here is that with user-run servers and powerfully configurable game
engines, these kinds of shenanigans will happen. While we as developers can't protect our more
casual users from joining any game server they wish, we can do a better job of letting them
know when they are encountering something that could be different from what they expect.
Quake 3: Arena set a great example when it introduced the concept of a "pure" server. It's a
simple idea that casual users can quickly grasp and set their gaming expectations by.
But why stop there? If we download data that includes a new set of weapon properties, why not
put a message on the screen saying, "Weapon properties modified"? If single-player cheat
commands are issued in the middle of a game, maybe we should send a message to every
client notifying them of that fact, so even players who aren't near the issuer can be made
aware. Empower players to easily determine whether the games are fair or not.
Rule #9: Honest players would love for a game to tip them off to possible cheating. Cheaters
want the opposite.
Technically, this category of cheats is one that we bring upon ourselves: bugs in our games can
be discovered by users and used to disrupt game play. Most bugs don't enable cheating, but a
few do.
A good example is the farm-stopping bug in the unpatched version of Age of Empires. When a
user had both a villager and a farm selected, he could issue the Stop command. Because the
command was valid for a villager, it was allowed to go through, but listed both objects as the
target of the command. The villager would stop working as expected and reset its state. The
farm would also reset itself, something it never did normally, and replenish its food supply.
Once this was discovered by players, it drastically changed the game for them, giving them a
huge advantage over those who didn't know about it.
I encountered another bug when playing Half-Life. I would get into a firefight with another
player, both of us using the same weapon, but when it came time to reload our weapons, my
opponent was able to reload much more quickly than I could. Sure enough, when the next
patch came out, I saw in the release notes that a bug allowing fast reloads was fixed. There's
really not much we can do about these types of bugs, other than fix them with a patch.
Environmental Weaknesses
My last category of cheats is something of a catchall for exploitable problems a game may have
on particular hardware or operating conditions. A good example is the "construction-cancelled"
bug that was found amazingly in both Age of Empires and Starcraft at about the same time.
The element needed to make it work was extreme lag in network communications, to the point
of a momentary disconnection. When this happened, the game engines stopped advancing to
the next game turn while they waited for communications to resume. During this time, the user
interface still functioned, so the player didn't think the game had locked up. While the game
was in this state, a player could issue a command to cancel construction of a building, returning
its resources to the player's inventory - only the player would issue the command over and over
as many times as possible. Normally, a player could only issue one Cancel command per turn,
but because the game simulation was in a holding state, multiple command requests went into
the queue. Because of some necessities of RTS engine design, when an object is destroyed
during a turn by something such as a Cancel command, the destruction is postponed until after
all the commands for that turn have been processed. The result was the command executed
multiple times during one game update.
Once discovered, this had a horrible impact on online games. People deliberately caused
massive lags to take advantage of the cheat. To fix it in Age of Empires, we had to update the
validation checks to see if a similar request was already pending on the current turn and reject
duplicates.
Another bug of this type involved the game Firestorm and its interaction with the Windows
clipboard. It seems a clever user found out that if he pasted text from the clipboard into his
chats and that text contained a certain character not normally used, the game would crash
when it attempted to print it to the screen -- on all player's machines. He then treated this
knowledge as a personal nuclear bomb that he could spring on people when he found himself
losing.
Yet another example taken from Age of Empires is what happens when a player's network
connection is overloaded or ping-flooded by another player. When such an attack renders a
game unable to communicate with its peers, the other players decide that something is wrong
with that player and drop him from the game- - a totally necessary capability, but one that can
be exploited in a modern twist on scattering all the pieces on a game board when you are
losing. This was one the major reasons we added Multiplayer Save and Restore capabilities to
Age of Empires II.
I hope these examples got you thinking about some of the problems and issues at stake when
developers address the problem of online cheating. We certainly have a lot more ground to
cover, from massively multiplayer games, open source, and consoles, to enabling the online
communities to better police the situation. But we're out of space and time for now.
Listing 1. Hiding the variables that tip off hackers to possible cheats.
Matt Pritchard is busy trying to be a modern renaissance man. When not working
hard on his latest game, he can be found spending time with his family or collecting
antique videogames. Send e-mail to [email protected].
Editor's note: This paper was originally published in the 2000 Game Developer's Conference proceedings
1. Introduction
Most of you with Computer Science training have probably been through the typical Artificial
Intelligence lecture on search and planning. You are shown A*, with some trivial example (so
your professor doesn't get lost while doing it) which shows all of the various parts of A*. You've
also sat through the proof of why A* generates an optimal solution when it has an admissible
heuristic. If you're really lucky, you get to implement A* in Lisp or Prolog in an assignment, and
solve a puzzle involving sliding tiles.
Jump ahead a few years, and you've been given the task of implementing a pathfinding
algorithm for your game. You sift through your notes, trying to remember why you need a
CLOSED list, and how to translate all the car() and cdr() instructions from Lisp into something
that your lead programmer won't bring up during your next performance review. You study web
sites on AI and pathfinding, try a few enhancements, and eventually come up with a solution
that behaves in a very similar manner to the A* algorithm from your notes.
In an alternate universe, there are academics and hobbyists that concentrate on computer
games of thought, such as chess, checkers and Othello. There are regular tournaments between
programs, and the two main ways to outplay your opponent and win the game involve
outsearching your opponent and having a smarter (but still computationally fast) evaluation of
positions. I have heard each of these statements while chatting with other Othello programmers
during tournaments. Do these statements sound like anything you've heard a programmer in
your company mention?
z "I don't trust C++ to generate fast code, so I'm still using ANSI C."
z "I coded the inner loop in assembly. It took me two months of work, but it speeds up the
program by 10%, so it was worth it."
z "I've had about eight hours of sleep in 72 hours, but I've improved the performance."
Computer chess programmers been dealing with a search algorithm (cryptically called ab) for
the last 25 years. They have a library of standard enhancements that they can use to enhance
ab and improve the performance of their program without having to resort to learning MIPS
processor machine language, or trying to acquire knowledge about what sort of positions their
program handles poorly.
Academics involved in the field often quoted the desire to beat the World Chess Champion in a
game of chess to get their research funding. However, IBM and Deep Blue brought the funding
train to a screeching halt. Most have moved onto games that are significantly harder for the
computer to do well at, such as Poker, Bridge and Go. However, others realized that A* search
really is not all that different from .
When we cast aside the superficial differences between the two algorithms, we quickly discover
A* and ab are actually remarkably similar, and we can use the standard search enhancements
from the typical computer chess program in your pathfinding algorithm. We will be describing
the subset of the computer chess based search enhancements that we use in our pathfinding
code at BioWare.
Section 2 will quickly review the standard A* algorithm (so you do not have to dig out your AI
lecture notes again). Section 3 will discuss the anatomy of a computer chess search algorithm,
and Section 4 shows you how to put the search enhancements into A*.
A* [Hart 1968, Nilsson 1971] is one of the preferred methods of dealing with the pathfinding
problem. A* search starts with the initial state in a main data structure known as the OPEN list.
The CLOSED list represents the positions that the algorithm has already examined, and is
initially empty. For each node within the OPEN and CLOSED lists, A* maintains two heuristic
values: g(n), the best-known minimum cost, and h(n), the estimate of the cost to a goal state.
Thus, the best node to examine at any point in the algorithm has the lowest estimated cost: f
(n) = g(n) + h(n).
The A* algorithm is an iterative process. In each step, A* takes the best state s from the OPEN
list and moves it to the CLOSED list. The successors of the best state, si, are generated and
examined in turn. If a successor si does not appear in either the OPEN or CLOSED list, then si is
added to the OPEN list. However, if si already appears in either list, we must check to see if the
minimum cost g(n) has decreased. If g(n) decreases, the node si must be deleted from its
current location and reinserted into the OPEN list.
The heuristic h(n) is critical for the performance of the A* algorithm. h(n) is said to be
admissible if the heuristic never overestimates the cost of travelling to the goal state. This is
important because if h(n) is admissible, A* is guaranteed to generate the least cost or optimal
solution the first time the goal node is generated. In the case of the typical pathfinding
algorithm, h(n) is the straight line distance between the current point n and the target point.
Some of the performance information referenced in this paper refers to the sliding-tile puzzle
instead of pathfinding, since this has been the most popular test in academic circles for
studying A*. An example of the sliding-tile puzzle can be found in Figure 1. In the sliding-tile
puzzle, the Manhattan distance (the sum of the vertical and horizontal displacements of each
tile from its current square to its goal square) is an admissible and effective heuristic for use in
A* search.
Games such as chess, checkers and Othello belong to a broad group of games called two-player
zero-sum games with perfect information. Zero-sum implies that if one player wins, the other
player loses. Perfect information implies the entire state of the game is known at any time.
Scrabble has hidden tiles, and is defined as a game of imperfect information.
Two-player zero-sum games with perfect information are well known to game theoreticians
[von Neumann 1944]. In any position for a game in this category, an optimal move can be
determined. An optimal move can be determined via the minimax algorithm which, for a game
like chess, contains a matrix that has been estimated to contain more molecules than our entire
planet! However, all hope is not lost, since there are alternative formulations of the minimax
algorithm that involve searching a game tree.
The root of a game tree represents the current state of the game. Each node in the tree can
have any number of child nodes. Each child of a node represents a new state after a legal move
from the node's state. This continues until we reach a leaf, a node with no child nodes, in the
game tree. We assign a payoff vector to each leaf in the game tree. In a generalized game tree,
this payoff vector represents the utility of the final position to both players. In general, winning
a game represents a positive utility for a player, while losing a game represents a negative
utility. Since the game is a two-player zero-sum game, the utility for the first player must equal
the negative of the utility for the second player. The utility for the side to move at the root of
the tree is usually the only one given to save space.
In Figure 2, an example of a game tree for a game of Naughts and Crosses (or Tic-Tac-Toe) is
given. Note that the two players take alternating turns at different levels of the tree. X moves
at the root, while the opponent, O, moves at the first level below the root. A position is
normally categorized by how many levels down in the game tree it is located. The common
term for this is ply. The root is said to be at ply 0, while the immediate successors of the root
are said to be at ply 1, et cetera.
Naughts and Crosses, like chess and checkers, has only three possible outcomes for a player:
win, loss or draw. Normally, we assign the payoff of +1, 0 and -1 to a win, draw or loss for the
player to move, respectively. These payoffs are given in Figure 2 at the bottom of each leaf
position, with respect to the player with the crosses.
We will give names to each player to simplify our discussion. Let us call the player to move in
the initial position Max and the opponent Min. At each node in the tree where Max has to move,
Max would like to play the move that maximizes the payoff. Thus, Max will assign the maximum
score amongst the children to the node where Max makes a move. Similarly, Min will minimize
the payoff to Max, since that will maximize Min's payoff. The maximum and minimum scores
are taken at alternating levels of the tree, since Max and Min alternate turns.
In this way, all nodes in the tree can be assigned a payoff or minimax value, starting from the
leaves of the tree and moving up the tree towards the root. In Figure 3, we give minimax
values for all nodes in our Naughts and Crosses game tree (Figure 2). These minimax values
tell us what the best possible outcome for Max is in any position within the game tree, given
that Min will do its best to foil Max's plans.
Once the root of the game tree has been assigned a minimax value, a best move for Max is
defined as a move which leads to the same minimax value as the root of the tree. We can trace
down the tree, always choosing moves that lead to the same minimax value. This path of
moves gives us an optimal line of play for either player, and is known as a principal
variation. Note that in our game of Naughts and Crosses, the side playing the Crosses will draw
the game, but only if an X is played in the lower central square. Playing to either square in the
top row can lead to a loss for the Crosses, if the opponent plays the best move.
To compute the minimax value of a position, we can use any algorithm that searches the whole
game tree. A depth-first search uses less memory than a best-first or breadth-first tree search
algorithm, so it is preferred in current game-tree search programs. In Figure 3, we show two C
functions which are the basis of a recursive depth-first search of a game tree. By calling
Maximize with a position p, we will get the minimax value of position p as the output of the
function after the entire game tree has been searched.
In Listing 1, we have left out some of the details. For example, we have not defined what a
position is, since this is game-dependent. There are three additional functions that would be
required to implement the minimax search: (1) EndOfGame, which determines whether the
game is over at the input position, returning TRUE if the game is over; (2) GameValue, which
accepts a position as a parameter, determines who has won the game, and returns the payoff
with respect to the player Max; and (3) GenerateSuccessors which generates an array of
successor positions (p.succ[]) from the input position, and returns the number of successors to
the calling procedure.
Note that Maximize() and Minimize() recursively call one another until a position is reached
where the EndOfGame() function returns TRUE. As each successor of a node is explored,
gamma maintains the current assessment of the position, based on all of the moves that have
been searched so far. Once all successors have been examined, the minimax value for that
position has been computed and stored in gamma, which can be returned to a higher level
within the tree (please refer to Listing 1).
The minimax algorithm can also determine which move yields the score gamma, and return
that up the tree as well. However, there is only one place we are interested in the move choice:
the root of the game tree. We could write a special version of Maximize that returns a best
move and the minimax value.
This formulation requires exactly the same amount of work as the matrix formulation did, but
further pruning can be done on this tree. The algorithm [Knuth 1975] improves on the
typical minimax algorithm by passing down bounds throughout the tree and can prune off
branches that can be shown to have no relevance on the minimax value of the game tree. With
the algorithm, one can search the optimal number of terminal positions required to
determine the minimax value (if one always searches the best move first at every node)! In
practice, that doesn't happen (why would you need to search if you already knew the best
move?), so there are variants on such as NegaScout [Reinefeld 1983] and MTD(f) [Plaat
1996] that have been shown to be significantly better than in practice.
However, it would still take a modern computer millions of years to evaluate the full game tree
for the game of chess if one had to go all the way to the terminal nodes. How can we control
the size of the search?
One method that most practitioners employ is to search the tree to a fixed depth, k ply from
the root node, and use approximate minimax value at that level. However, the nature of the
pruning algorithms (such as NegaScout and MTD(f)) yield game trees that can vary widely in
size at the same nominal depth. Computer chess has real time limits, and if one exceeds those
time limits, the game is lost, so having an algorithm that can generate a rational decision at
any time is very important. Thus, a technique called iterative deepening [Scott 1969] is used.
The idea is that the algorithm should be limited to exploring a small search depth k by
forcing evaluations of nodes once they reach that depth. Once that search is done, the limit k
can be moved forward by a step s, and the search can be repeated to a depth of k+s. In chess
programs, k and s usually equal 1. Thus, the program does a 1-ply search before doing a 2-ply
search, which occurs before the 3-ply search et cetera.
Scott noted that there is no way of predicting how long an ab search will take, since it depends
heavily on the move ordering. However, by using iterative deepening, one can estimate how
long a (k+1)-ply search will take, based on the length of the preceding k-ply search.
Unfortunately, the prediction may be far off the accurate value. In some cases, a real time
constraint (such as a time control in a chess game) may necessitate aborting the current
search. Without iterative deepening, if a program has not finished a search when the time
constraint interrupts the search, the program may play a catastrophic move. With iterative
deepening, we can use the best move from the deepest search that was completed.
Other benefits were explored by Slate and Atkin in their Chess 4.5 program [Slate 1977]. They
discovered that there were many statistics that could be gathered from a search iteration,
including the principal variation. The principal variation of a k-ply search is a good starting place
to look for a principal variation of a (k+1)-ply search, so the principal variation from the k-ply
search is searched first at depth (k+1). This improves the ordering of the moves in the (k+1)-
ply search. Usually, the number of bottom positions explored for all of the searches up to depth
d with iterative deepening is significantly smaller than attempting a d-ply search without
iterative deepening.
Specific information about a search can be saved in a transposition table [Greenblatt 1967].
In the minimax algorithm given in Listing 1, all of the information about a node can be
accumulated including the best score, the best move from that position, the depth it was
searched to. All of this information is commonly stored into one transposition table entry.
Transposition tables are normally constructed as closed hash tables, with hashing functions that
are easy to update (such as a number of XOR operations) as one traverses the tree. The
transposition table information can be used in two main ways: duplicate detection and move
ordering.
Why would we need to detect duplicates in a game tree? In reality, the game tree is a graph;
some of the positions appear in multiple places within the tree. Thus, it makes sense that each
position should only be explored once if the information obtained is sufficient to terminate the
search. The transposition table assists in finding and eliminating these duplicated positions.
The same position in the game will always hash to the same location in the transposition table.
What if the information stored in the table is the same position as the current node, and the
stored result of a search of that position is at least as deep as the search we are attempting to
execute? If we have an exact minimax value in the hash table for a search that is at least as
deep as the one to be executed, we can use the result from the hash table and prune the entire
search.
Most of the time, the duplicate detection will fail to completely eliminate the search, and we can
exploit the transposition table to improve our move ordering. In the games we are studying, the
best move from a previous search depth is likely to be the best move at the current search
depth. Thus, we can obtain the previous best move from the transposition table, and search the
previous best move before all others. In general, the move ordering benefits of combining
iterative deepening and the transposition table are at least as important to the node count as
the duplicate detection property, depending on the application chosen.
The transposition table only offers move ordering information about a single move in the move
list. The history heuristic [Schaeffer 1989] is a useful technique for sorting all other moves. In
the game of chess, a 64 by 64 matrix is used to store statistics. Each time a move from a
square startsq to a square endsq is chosen as a best move during the search, a bonus is stored
in the matrix at the location [startsq,endsq]. The size of this bonus depends on the depth at
which the move was successful at. A bonus that varies exponentially based on the depth of the
subtree under that position has been found to work well in practice. Moves with higher history
values are more likely to be best moves at other points in the tree; thus, moves are sorted
based on their current history values. This makes a dynamic ordering for all possible legal
moves in cases where no ordering information exists.
In the programs that the author is aware of, both move ordering techniques are used. The
transposition table move is always used first, since it yields specific information about that node
from a previous search. Once the transposition table move has been searched, the remaining
moves are sorted by the history heuristic.
Stockman [1979] introduced the SSS* algorithm, a variant to the depth-first search
algorithms for determining the minimax value. Initially, it was believed that the algorithm
dominated in the sense that SSS* will not search a node if did not search it. A problem
with SSS* is that a list structure (the OPEN list) must be maintained, which could grow to
b^d/2 elements, where b is the branching factor and d is the depth of the tree to be searched.
At the time, this space requirement was considered to be too large for a practical chess-playing
program. Even if the space requirement was not a problem, maintaining the OPEN list slowed
down the algorithm to make it slower than in practice.
Although versions of SSS* eventually managed to become faster than for game trees
[Reinefeld 1994a], it has been recently discovered that SSS* can be implemented as a series of
null-window calls, using a transposition table instead of an OPEN list [Plaat 1996]. The
research showed that the drawbacks of SSS* are not true. However, it is also important to note
that the benefits also disappear: SSS* is not necessarily better than when dynamic move
reordering is considered. When all of the typical enhancements are used, SSS* can be
outperformed by NegaScout and MTD(f).
In game-tree search, a depth-first search algorithm generates results faster than a best-first
search algorithm. A* is also a best-first search algorithm. Is there a better single-agent search
algorithm than A*, that uses a depth-first iterative deepening formulation?
4. Reimplementing A*
The first stage of the plan is to reimplement A* as a depth-first search algorithm. The second
stage is to implement the two move ordering techniques that we described in Section 3:
transposition tables and the history heuristic.
4.1 IDA*
Korf [1985] was the first researcher to emphasize why one would want to use depth-first
iterative deepening in any search framework, including heuristic searches such as A*.
The implementation will be described first. In Listing 2, we see some sample code that could be
used to implement IDA*. There are similarities between this formulation and Maximize (from
Listing 1). This implementation, like Minimax, is simple for a search domain that has discrete
steps. For each position p, we check to see if we are at a goal node, or if the search should be
cut off immediately. We use the typical condition from A* search, provided that we have an
admissible heuristic for estimating the distance to the goal. If the depth (number of steps
taken, g in the code and referred to as g(p) in the literature) plus the heuristic estimate to the
goal (h or h(p)) is greater than the cut off for this search (fmax), then we stop searching at this
node. If we are not at a goal node, nor should the search be pruned, we search all successors
recursively. It is important to note that we increase the distance traveled by a step at each ply
of the tree.
How does one determine fmax? A second routine usually sets this value, calling IDA* over and
over again until a goal node is found. In our sample, ComputeNumberOfMoves() is a sample
driver routine for IDA*, and can be seen in Listing 3. When we pass a position into the
ComputeNumberOfMoves()function, we expect to get the minimum number of steps required to
reach a goal node. The algorithm starts with fmax = h(startposition), and then calls the IDAStar
() function, incrementing fmax until the IDAStar() function returns TRUE.
The idea of IDA* is not new, and it should not be new to you either. A previous article in Game
Developer [Stout 1996] mentions this in passing as something you may want to consider to
improve the speed of your pathfinding.
Now, you are probably wondering whether IDA* is any worse than A*. After all, A* only
expands each node once during a search, and the nodes near the top of the tree are expanded
many times. A* and IDA* have been shown mathematically to search c * b^d nodes, where b
is the typical number of alternatives at each node, d is the depth to the closest solution and c is
a constant. The only difference is that the constant c is a little larger for IDA*.
IDA* is a little slower, but what do you gain? Well, have you seen any mention of sorting OPEN
positions on a list, or inserting entries into the CLOSED list? When you use a depth-first
iterative deepening approach, you don't have to store either list. IDA* uses O(d) memory
instead of A*, which uses O(b^d) memory. This makes IDA* a good choice where memory is at
a premium. Also note that because you have very little state information during a search, IDA*
is very easy to save and restore if the AI time slice is up.
If you are using IDA*, you have lost the CLOSED list. Unlike the OPEN list, the CLOSED list has
other functions. The primary function of the CLOSED list in A* is the ability to detect duplicate
positions within the tree. If the same node is reached by two separate paths, IDA* will blindly
search through the node both times. When the first path to the node is shorter than the second
path, we have wasted search effort. We would like a technique that allows us to detect
duplicates, and store information about the previously attempted depth at a given node. Thus,
we want to apply transposition tables to IDA*.
A transposition table in a computer chess program is implemented as a closed hash table where
older and less relevant data can be overwritten with newer and more relevant data. By taking
the position p and computing a hash function hash(p), one can store the fact that the node was
reached in g steps and searched to a total path of size f unsuccessfully. Whenever a node is
first examined in IDA*, we can quickly look in the table and determine whether or not the node
has been previously searched. If it has, we can compare the stored number of steps to reach p
and the current number of steps taken to reach p. If the current path to reach p is longer than
the stored path, we do not search the successors of this node. This information about g for
various positions is also stored in between iterations of fmax. Whenever we reach the position
by a non-optimal path, one can immediately eliminate the search for all future iterations.
We can also use the transposition table to detect duplicate positions, but we can use it to tell
the algorithm which successor had the most promise during a previous iteration for each
position. One measure of promise is the successor leading to the smallest h value during the
current level of search. The stored move is searched first on subsequent iterations, in the same
manner that we search the stored move from the transposition table first in a computer chess
program.
For the typical pathfinding algorithm, depending on the structure of your nodes, you may not
need a large hash table. One can derive a large portion of the benefit by having a CLOSED list
that is only 2-5% of the size of the typical CLOSED list generated by an A* search on the same
position.
A key component of the transposition table is the entry replacement scheme. To replace
entries, one would overwrite an entry in the hash table if the node in the hash table has a
longer path from the start position than the node we want to write in (store the lowest g(p)
instances). We want to do this because cutoffs higher up in the tree save more nodes than
cutoffs lower down in the tree. Other recent research in the computer chess community has
dealt with two-stage replacement schemes for transposition tables. In one experiment [Breuker
1996], half of the transposition table was reserved for the nodes closest to the root, and the
other half of the transposition table was reserved for the most recently visited nodes regardless
of depth. This yielded an improvement to search size at a minimal performance cost.
How much does adding a transposition table to IDA* save us? Experiments have shown that a
transposition table can reduce the size of standard 15-puzzle searches by nearly 50%
[Reinefeld 1994b], with the cost of storage and access being O(1), in comparison to O(log n)
searches on a data structure for a CLOSED list that doesn't lose information.
A sliding-tile experiment [Reinefeld 1994b] gave a description for a history heuristic for the 15-
puzzle. The history heuristic was stored in a 3-dimensional array. The three dimensions were
the tiles (16) in each position (16) moving in each direction (4). To install information, the
experiment counted the number of times that a move led to the deepest subtree (i.e. attained
the smallest h value for an examined node within its subtree). The experiment met with some
success, as the IDA* algorithm searched approximately 6% less nodes when the history
heuristic was used versus the version that used a static move ordering.
We could use both the static move ordering and the dynamic information gathered by the
history heuristic to generate a hybrid heuristic for ordering successors. This type of hybrid
heuristic could improve the ordering of moves more than either technique in isolation.
5. Conclusions
We have implemented the techniques described above, and we are currently using them to plot
paths for our creatures in our soon-to-be-released role-playing game, Neverwinter Nights.
There are many caveats to using these techniques, and it is important to be able to understand
the drawbacks. The speed improvements that these techniques yield will vary depending on
your application (they vary dramatically when implementing them in chess, Othello and
checkers programs!) ... but you now have some new enhancements that can help you search
more efficiently.
To summarize the utility of adding standard enhancements to search algorithms, let us examine
another problem: finding push-optimal solutions for Sokoban problems. If you have never seen
the game Sokoban, a picture of one of the 90 positions is given in Figure 4. The goal is for the
little worker to push all of the round stones into the goal squares (the goal squares are shaded
with diagonal lines). On the surface, this may seem as easy as pathfinding, and an easy
application for A*. However, all pathfinding "mistakes" are undoable by retracing the path. One
wrong push of a stone could leave you in a state where you are unable to complete the task.
Thus, the need to plan the path of all stones to the goal squares is paramount.
IDA* is incapable of solving any of the puzzles with 20 million nodes searched. If we enhance
IDA* with the transposition table and the move ordering techniques, 4 of the puzzles can be
solved [Junghanns 1997]. If we search one billion nodes, only 6 of the 90 puzzles can be solved
using IDA*, transposition tables and move ordering. If we use all of the domain-dependent
techniques the researchers developed (including deadlock tables, tunnel macros, goal macros,
goal cuts, pattern search, relevance cuts and overestimation), the program Rolling Stone can
solve 52 of the 90 problems within the billion node limit for each puzzle [Junghanns 1999].
Pathfinding is a relatively trivial problem in comparison to finding push-optimal solutions for
Sokoban puzzles, and I am happy to say my bosses at BioWare haven't asked me to solve
Sokoban in real time.
There's a lot of very good academic information on single-agent search (including a special
issue of the journal Artificial Intelligence later this year which will be devoted to the topic), and
I would encourage everyone to look up some of these references. If you have any further
questions on any of the reference material, please feel free to e-mail me.
Mark Brockington is the lead research scientist at BioWare Corp. His email adress is
[email protected]
int Maximize(position p) {
int Minimize(position p) {
fmax = EstimateDistanceToGoal(startposition);
while (bFoundPath == FALSE)
{
bFoundPath = IDAStar(startposition);
if (bFoundPath == FALSE) { fmax += 1; }
}
return fmax;
}
References
[Greenblatt 1967] R. D. Greenblatt, D. E. Eastlake, and S.D. Crocker. The Greenblatt Chess
Program. In Proceedings of the Fall Joint Computer Conference, volume 31, pages 801-810,
1967.
[Hart 1968] P. E. Hart, N. J. Nilsson, and B. Raphael. A Formal Basis for the Heuristic
Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics,
SSC-4(2):100-107, 1968.
[Korf 1985] R. E. Korf. Depth-First Iterative Deepening: An Optimal Admissible Tree Search.
Artificial Intelligence, 27:97-109, 1985.
[Plaat 1996] A. Plaat, J. Schaeffer, W. Pijls, and A. de Bruin. Exploiting Graph Properties of
Game Trees. In AAAI-1996, volume 1, pages 234-239, Portland, Oregon, August 1996.
[Reinefeld 1994a] A. Reinefeld. A Minimax Algorithm Faster than Alpha-Beta. In H.J. van den
Herik, I.S. Herschberg and J.W.H.M. Uiterwijk, editors, Advances In Computer Chess 7, pages
237-250. University of Limburg, 1994.
[Schaeffer 1989] J. Schaeffer. The History Heuristic and Alpha-Beta Search Enhancements In
Practice. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-11(11): 1203-
1212, 1989.
[Scott 1969] J. J. Scott. A Chess-Playing Program. In B. Meltzer and D. Michie, editors, Machine
Intelligence 4, pages 255-265. Edinburgh University Press, 1969.
[Slate 1977] D. J. Slate and L. R. Atkin. Chess 4.5 - The Northwestern University Chess
Program. In P.W. Frey, editor, Chess Skill in Man and Machine, pages 82-118. Springer-Verlag,
New York, 1977.
[Stout 1996] W. B. Stout. Smart Moves: Intelligent Path-Finding. Game Developer, pp. 28-35,
Oct./Nov. 1996.
[von Neumann 1944] J. von Neumann and O. Morgenstern. Theory of Games and Economic
Behavior. Princeton Press, Princeton, NJ, 1944
Imagine a scene in a game in which a rocket flies through the air, leaving a smoke trail behind
it. Suddenly the rocket explodes, sparks flying everywhere. Out of the disintegrating rocket a
creature is jettisoned towards you, its body parts exploding and blood flying through the air,
leaving messy blood splatters on the camera lens. What do most of the elements in this scene
have in common?
Yes, most of these elements are violent. But in terms of technology, most of the effects in a
scene like this would benefit from a good particle system. Smoke, sparks, and blood are
routinely created in today's games using particle systems.
To realize these effects, you need to build a particle system, and not just a simple one. You
need an advanced particle system, one that's fast, flexible, and extensible. If you are new to
particle systems, I recommend you begin by reading Jeff Lander's article on particle systems
("The Ocean Spray in Your Face," Graphic Content, July 1998). The difference between Lander's
column and this article is that the former describes the basics of particles, whereas I will
demonstrate how to build a more advanced system. With this article I will include the full
source code for an advanced particle system, and you can download an application that
demonstrates the system.
Advanced particle systems can result in pretty large amounts of code, so it's important to
design your data structures well. Also be aware of the fact that particle systems can decrease
the frame rate significantly if not constructed properly, and most performance hits are due to
memory management problems caused by the particle system.
When designing a particle system, one of the first things to keep in mind is that particle
systems greatly increase the number of visible polygons per frame. Each particle probably
needs four vertices and two triangles. Thus, with 2,000 visible snowflake particles in a scene,
we're adding 4,000 visible triangles for the snow alone. And since most particles move, we can't
precalculate the vertex buffer, so the vertex buffers will probably need to be changed every
frame.
The trick is to perform as few memory operations (allocations and releases) as possible. Thus, if
a particle dies after some period of time, don't release it from memory. Instead, set a flag that
marks it as dead or respawn (reinitialize) it. Then when all particles are tagged as "dead,"
release the entire particle system (including all particles within this system), or if it's a constant
system, keep the particle system alive. If you want to respawn the system or just add a new
particle to a system, you should automatically initialize the particle with its default
settings/properties set up according to the system to which it belongs.
For example, let's say you have a smoke system. When you create or respawn a particle, you
might have to set its values as described in Table 1. (Of course, the start color, energy, size,
and velocity will be different for blood than, say, smoke.) Note that the values also depend on
the settings of the system itself. If you set up wind for a smoke system so the smoke blows to
the left, the velocity for a new particle will differ from a smoke system in which the smoke just
rises unaffected by wind. If you have a constant smoke system, and a smoke particle's energy
becomes 0 (so you can't see it anymore), you'll want to respawn its settings so it will be
replaced at the bottom of the smoke system at full energy.
Some particle systems may need to have their particles rendered in different ways. For
example, you may want to have multiple blood systems, such "blood squirt," "blood splat,"
"blood pool," and "blood splat on camera lens," each containing the appropriate particles.
"Blood squirt" would render blood squirts flying through the air, and when these squirts collided
with a wall, the "blood splat" system would be called, creating messy blood splats on walls and
floors. A blood pool system would create pools of blood on the floor after someone had been
shot dead on the ground.
Each particle system behaves in a unique manner. Blood splats are rendered differently than
smoke is displayed. Smoke particles always face the active camera, whereas blood splats are
mapped (and maybe clipped) onto the plane of the polygon that the splat collides with.
When creating a particle system, it is important to consider all of the possible parameters that
you may want to affect in the system at any time in the game, and build that flexibility into
your system. Consider a smoke system again. We might want to change the wind direction
vector so that a car moving closely past a smoke system makes the smoke particles respond to
the wind generated by the passing car.
At this point you may have realized that each of these systems (blood splat, smoke, sparks,
and so on) is very specific to certain tasks. But what if we want to control the particles within a
system in a way not supported by the formulae in the system? To support that kind of
flexibility, we need to create a "manual" particle system as well, one that allows us to update all
particle attributes every frame.
The last feature we might consider is the ability to link particle systems within the hierarchy of
our engine. Perhaps at some point we'll want to link a smoke or glow particle system to a
cigarette, which in turn is linked to the head of a smoking character. If the character moves its
head or starts to walk, the position of the particle systems which are linked to the cigarette
should also be updated correctly.
So there you have some basic requirements for an advanced particle system. In the next
section, I'll show how to design a good data structure that is capable of doing all the above-
mentioned features.
Now that we know what features we need, it's time to design our classes. Figure 1 shows an
overview of the system we're going to build. Notice that there is a particle manager, which I will
explain more about in a moment.
Figure 1. A global overview of the sytem we
are going to build
Let's use a bottom-up approach to design our classes, beginning with the particle class.
The particle class. If you have built a particle system before, you probably know the types of
attributes a particle must have. Table 1 lists of some common attributes.
Note that the previous position of a particle can also be useful in some systems. For example,
you might want to stretch a particle between its previous and current positions. Sparks are a
good example of particles that benefit from this feature. You can see some spark effects I've
created in Figure 2.
The particle system class. This class is the heart of the system. Updating the particle
attributes and setting up the shape of the particles takes place inside this class. My current
particle system class uses the node base class of my 3D engine, which contains data such as a
position vector, a rotation quaternion, and scale values. Because I inherit all members of this
node class, I can link my particle systems within the hierarchy of the engine, allowing the
engine to affect the position of the particle system as discussed in the above cigarette example.
If your engine does not have hierarchy support, or if you are building a stand-alone particle
system, this is not needed. Table 2 lists the attributes which you need to have in the particle
system base class.
Here's how to calculate the four positions of a normal (not stretched) particle that always faces
the active camera. First, transform your particle world-space position into camera-space
(multiply the world-space position and your active camera matrix) using the size attribute of
the particle to calculate the four vertices.
The four vertices, which form the shape, are what we use to render the particle, though a
particle has only one position, xyz. In order to render a particle (such as a spark), we need to
set up a shape (created from four vertices). Two triangles are then rendered between these
four points. Imagine a non-stretched particle always facing the camera in front of you, as seen
in Figure 3.
In our case, the particle is always facing the active camera, so this means we can simply add
and subtract values from the x and y values of the particle position in camera-space. In other
words, leave the z value as it is and pretend you are working only in 2D. You can see an
example of this calculation in Listing 1.
The functions. Now that we know what attributes are needed in the particle system base
class, we can start thinking about what functions are needed. Since this is the base class, most
functions are declared as virtual functions. Each type of particle system updates particle
attributes in a different way, so we need to have a virtual update function. This update function
performs the following tasks:
Now our base class has the ability to update the particles, and we are ready to set up the
shapes which can be constructed using the new (and perhaps previous) position. This function,
SetupShape, needs to be virtual, because some particle system types will need to have their
particles stretched and some won't. You can see an example of this function in Listing 1.
To add a particle to a given system, or to respawn it, it's useful to have a function that takes
care of this. Again, it should be another virtual function, which is declared like this:
As I explained above, this function initializes the attributes for a given particle. But what if you
want to alter the speed of the smoke or change the wind direction that affects all of your smoke
systems? This brings us to the next subject: the particle system's constructor. Many particle
systems will need their own unique constructors, forcing us to create a virtual constructor and
destructor within the base class. In the constructor of the base class, you should enter the
following information:
z The number of particles you initially want to have in this particle system.
z The position of the particle system itself.
z The blend mode you want to use for this system.
z The texture or texture file name you want this system to use.
z The system type (its ID).
In my engine, the constructor in the particle system base class looks like this:
So where do various settings, such as the wind direction for the smoke system, get addressed?
You can either add settings specific to the system type (such as wind direction) into the
constructor, or you can create a struct called InitInfo inside each class, which contains all of the
appropriate settings. If you use the latter method, make sure to add a new parameter in the
constructor, which is a pointer to the new struct. If the pointer is NULL, the default settings are
used.
As you can imagine, the first solution can result in constructors with many parameters, and
that's not fun to work with as programmer. ("Parameter number 14…hmmm. What does that
value represent again?") That's the main reason I don't use the first method. It's much easier
to use the second method, and we can create a function in each particle system class to
initialize its struct with default settings. An example of this code and a demo application can be
found on the Game Developer web site (http://www.gdmag.com) or my own site at
http://www.mysticgd.com/.
Now that we have covered the technology behind an individual particle system, it's time to
create a manager class to control all of our various particle systems. A manager class is in
charge of creating, releasing, updating, and rendering all of the systems. As such, one of the
attributes in the manager class must be an array of pointers to particle systems. I strongly
recommend that you build or use an array template, because this makes life easier.
The people who will work with the particle systems you create want to add particle systems
easily. They also don't want to keep track of all the systems to see if all of the particles died so
they can release them from memory. That's what the manager class is for. The manager will
automatically update and render systems when needed, and remove dead systems.
When using sporadic systems (systems which die after a given time), it's useful to have a
function that checks whether a system has been removed yet (for example, if it still exists
within the particle manager). Imagine you create a system and store the pointer to this particle
system. You access the particle system every frame by using its pointer. What happens if the
system dies just before you use the pointer? Crash. That's why we need to have a function
which checks if the system is still alive or has already been deleted by the particle manager. A
list of functions needed inside the particle manager class is shown in Table 3.
The AddSystem function will probably have just one parameter: the pointer to the particle
system which is of the type of our particle system base class. This allows you to add a smoke or
fire system easily depending on your needs. Here is an example of how I add a new particle
system in my engine:
During the world update function, I call the particleMgr->Update() function, which automatically
updates all of the systems and releases the dead ones. The Render function then renders all
visible particle systems.
Since we don't want to keep track of all particles across all of our systems every frame to see
whether all particles have died (so the system can be removed), we'll use the Update function
instead. If this function returns TRUE, it means that the system is still alive; otherwise it is dead
and ready to be removed. The Update function of the particle manager is shown in Listing 2.
In my own particle system, all particles with the same textures and blend modes assigned to
them will be rendered consecutively, minimizing the number of texture switches and uploads.
Thus, if there are ten smoke systems visible on screen, only one texture switch and state
change will be performed.
This electricity has its own render
function. A hierarchy tree was constructed
to represent the electricity flow using
branches and sub-branches. It is a
thunderstorm lightning effect with the
branches animated. Particle shapes are
being constructed for every part in the
electricity tree.
Designing a flexible, fast, and extensible advanced particle system is not difficult, provided you
take time to consider how you will use it within your game, and you carefully design your
system architecture accordingly. Because the system I discussed uses classes with inheritance,
you can also put the individual particle system types into .DLL files. This opens up the
possibility of creating some sort of plug-in system, which might be of interest to some game
developers.
You can also download the source code of my particle system, which I have created for
Oxygen3D, my latest engine. This source is not a stand-alone compilable system, but it should
help you if you run into any troubles. If you still have any questions or remarks, don't hesitate
to send me an e-mail.
John van der Burg is the lead programmer for Mystic Game Development, located in
the Netherlands. Currently he is working on Oxygen3D, which is his third hardware-
only engine and his eighth engine overall. Currently he is doing freelance work on
Loose Cannon for Digital Anvil and for OMG Games on The Crest of Dharim. He
previously freelanced for Lionhead Studios on Black and White, and for Orange Games
on Core. You can find screenshots of his previous work at
http://www.mysticgd.com/. Feel free to drop him a line at [email protected].
Networked computer games present new data distribution challenges to the developer. With
stand-alone games, all game information is available in one place, on one computer, all the
time. When designing networked games, however, the developer has to consider what
information the other players on the network need, when they need it, and how to deliver it to
them. The easy way out is to broadcast everything you can think of to everyone on the network
as often as possible. This keeps everyone synchronized, but quickly consumes all available
bandwidth - not a good idea for groups of players dialed in to the Internet. The other extreme is
to carefully consider each packet of data and to send it only to other players who really need it.
This approach makes efficient use of bandwidth, but deciding who needs data and who doesn't
chews up so many of the sender's CPU cycles that no processing power is left to play the game.
There is a middle ground, however, and it's called "grouping." Grouping allows networked
games to route essential data among the players while making efficient use of both network
and CPU resources. In some cases, underlying network service providers support an efficient
way to distribute grouped data; this is called multicasting. In other cases, games may be
limited to broadcast or point-to-point connections. Still, grouping can provide some efficiencies.
Grouping can help a networked game scale up to include large numbers of players. This article
covers logical ways to group data and the schemes used to implement grouped
communications.
What Is a Grouping?
The first thing to consider is why you'd want to group data at all. If you're developing a one-on-
one boxing game played on two computers, then chances are both players will need to know
everything about one another; every move and every change will need to be exchanged. It's a
very different situation, however, if you're building a large-scale space war scenario with lots of
players. You may have a very high-performance spaceship and may be doing rolls in seven
dimensions, but if I'm far away from the maneuvers, you just look like a dot to me. You're too
far away for me to see your detailed maneuvers, so I don't need to know the details of your
motion. Furthermore, if neither one of us can see the "cloaked" alien spaceship hovering
nearby, then there's really no need for him to send us his position at all until he becomes
visible. The bottom line is that if you think about various players' data requirements, you can
save a lot of bandwidth, network interrupt overhead, and CPU processing by only sending
information to the players who need it. The trick is to make routing decisions without spending
more cycles than you stand to save.
Before we get into the mechanics of creating data groups, let's look at how grouped data can be
distributed among computers on a network. Ordinarily, each sender can communicate with
other players on the network using either broadcast, one-to-everyone communications such as
UDP/IP or via one-to-one links such as TCP/IP. For computer games, what you really want is
one-to-many or many-to-one communications. Multicast-capable networks implement this
capability at the physical level, using addressing schemes that allow a single transmitted
message to be routed to many receivers while being ignored by others, even others on the
same LAN. Even in environments where multicasting is not available, or in server-based game
environments, the server can still effectively implement multicasting. A single transmission is
reflected (or "exploded") back to multiple receivers, albeit with added latency. Figure 1 shows
multicast transmission and its server/exploder equivalent.
Basic multicast technology doesn't say anything about what types of groupings to use, and
indeed, in DirectPlay any player can join any group. So how do you make best use of groups?
The approach to take in grouping data is to come up with some easily calculable measure for
each message and to route the message to groupings based on that measure.
When looking at any piece of data, the game has basically three pieces of information that it
can use to decide where to send it.
1. The data contents itself (such as the geographic location of an avatar, what team the
producing entity belongs to, what markings it has, what entity type it is, what
acceleration it has, and so on). This is called "data-based grouping."
2. The source of the data (that is, the place the data is coming from - either computer, site,
player, or object). This is called "source-based grouping."
3. The destination of the data. In "recipient-based grouping," data is sent to a group
comprised solely of those players who are to receive that information.
Data-Based Grouping. Data-based grouping is the most intuitive form of creating groupings,
and for many games, geographic sectorization is the most common form of data-based
grouping. Under geographic sectorization, the virtual world is broken up into regions and data is
grouped according to region. For example, in the scheme shown in Figure 2, 12 groups are used
to segment the play area based on a rectangular North/East grid. Each entity knows what
region it's in and sends its data to the corresponding group. Thus, the airplane shown in the
figure sends data to Group 6. It's important to note that entities don't have to recheck their
region location with every internal update (that is, every time they update their position). If you
know that a single region is 100 miles or 20 light-years across, and you know how fast your
vehicle can go, then you know the soonest you could possibly enter another region, and thus
can postpone checking your region until that time. The result is that geographic sectorization
yields an inexpensive (because it's infrequently calculated) way of creating groups of data.
Source-Based Grouping. In source-based grouping, every source of data (where a source can
be a LAN, a player, or even a particular entity) is assigned its own group, and every player then
subscribes only to those sources that it cares about. Under DirectPlay, only one player per
group would send to a group, though many players can issue IDirectPlay3::AddPlayerToGroup
commands to listen to the group. The problem then becomes one of telling the destinations
what information is being produced by each source and determining which groups to join?
In some cases, this problem is solved simply by well-known sources of particular data. For
example, if there's one computer that provides weather and terrain information for the entire
session, then all other players automatically know to listen to that player's group if they want to
get weather updates. A more general solution lies in creating a special group, which carries very
low frequency information about all entities or players in the game, to which everyone
subscribes. Since all players subscribe to this low-fidelity data channel, they know rough
information about all the entities in the game. They can then get more detail about the entities
they really care about by subscribing to the groups representing the sources that they want to
hear. One good thing about this scheme is that the number of groups used scales linearly with
the number of sources. In principle, the "source" can be anything, including a single entity or
avatar. A side benefit of this approach is that the low-fidelity group automatically gives each
player enough data to create a rough, "radar-screen" picture of the entire play space.
This scheme carries with it some extra communications overhead in that the data requirements
of every player must be transmitted to every other player during the exercise. Fortunately,
these interest expressions change rather slowly, so they don't result in a large amount of
information flowing between hosts, as is the case with dynamic game data. Recipient-based
grouping can also use up the largest number of groups of all the possible schemes, if you're
interested in creating groups for each possible combination of recipients. Recipient-based
grouping can be expensive to implement in terms of CPU cycles on the sending side, but unlike
either of the other schemes, it can offer perfect segmentation of data (you receive only what
you need, with no extra information).
Dynamic Grouping
All of the grouping schemes described previously are based on static definitions of groups. For
example, in the geographic sectorization scheme, the boundaries of Sector 1, and hence, the
definition of Group 1, are defined before the game starts running. This is the easiest way to
implement groups, since group definitions, and consequently, choices about where to send or
listen, can be hard-coded into the game.
Sometimes, however, static group definitions don't work out. Suppose in my spaceship game a
heated battle is taking place in Sector 6. The remainder of space may be relatively empty, but
there's a lot of activity in Group 6. In this case, using groups doesn't buy you much in the way
of performance, since most of the players wind up sending and receiving Group 6 data. The
solution in this case would be to create dynamic group definitions. With dynamic groupings,
each session would designate a player to serve as a group server for the session (under
DirectPlay, this role could be filled by the session host). The group server monitors or infers
traffic flow in the various groups and can dynamically redefine groups. In the spaceship game,
for example, a group server might decide to break up Sector 6 into four smaller sectors. It
might add new groups, sending out the new group definitions to each player. If the number of
groups available to a session is limited, perhaps by the available number of physical multicast
groups, then it might redefine existing groups (for example, by combining Sectors 4 and 8) to
free up groups. Figure 3 shows what the geographically sectorized space of Figure 2 might look
like after being dynamically adjusted. The bottom line is that with dynamic groups, each
player's software must listen to group definitions as broadcast from the group server and
modify its group memberships accordingly.
The other major problem you'll face with grouping is how to bring players up to speed when
they join new groups. In a grouped, multicasted world, each player knows only a subset of the
globally available data. When a player joins a new group (such as, when a spaceship flies into a
new sector), some anomalous things can happen. If the player doesn't immediately get all the
information about all the data in that group (such as, the identities and locations of all the other
ships in that sector), then he's essentially flying blind for a period; our ship can be blasted by
another ship that we've never even seen! This is another case where it's important to join
groups well in advance of when you'll actually need their data. If, on the other hand, the game
implements a mechanism to bring a new group member up to speed quickly, then a player who
joins a busy group may immediately be flooded with information, resulting in packet loss or
poor application performance. Any protocol to bring group-joiners up to speed must carefully
avoid this problem. This is another situation where a low-fidelity data group (as described under
geographic sectorization) can be useful; if all players listen to the low-frequency broadcasted
updates on the low-fidelity group, then they'll have at least some idea of what to expect when
entering a new sector.
Managing data distribution across networked games is not an easy problem. Simple data
distribution schemes don't scale well, limiting the potential size of games. More complex
schemes, such as the various types of data grouping, require more thought on the part of
programmer, but have been shown to greatly increase the potential scale of games. In the
military simulation world, the increased network and processor efficiencies resulting from
multicasting/grouping have yielded an order of magnitude increase in the number of entities
supported when compared to broadcast-based Distributed Interactive Simulation protocols.
Current game SDKs, such as Microsoft DirectPlay, offer group facilities but don't provide any
support for group definition or use. Any game developer looking to create scalable networked
games needs to consider efficient data distribution. The good news is that combining SDK
grouping functions with group definition concepts such as those described in this article can
yield games that are suitable for large-scale play by large numbers of distributed gamers.
The author would like to thank Dr. Edward Tiberius Powell and Larry Mellon, whose
pioneering designs for the JPSD and STOW projects form the basis for many of the
ideas presented in this article.
The design brief for the new game's AI has just been handed to you, and to call it optimistic
would be an understatement. You are charged with developing a real-time living, breathing city,
populated by thousands of pedestrians, hundreds of cars, and dozens of non-player characters.
The 'incidental' pedestrians and traffic need to react convincingly to each other and to your
actions, while the NPCs absolutely positively must act in a believable manner when you
encounter them. It's going to be computationally expensive, but you've only been given 20% of
the processor time each frame, and if you exceed that and the game frames out, you've failed.
Modern games will increasingly make such demands on hardware and programmers.
Fortunately help is at hand with techniques to control and manage real-time AI execution,
techniques that open up the possibility of future hardware acceleration of AI.
Games should be fun. This requirement has many consequences. One important consequence is
that games that allow player input at any moment ("arcade-style" games) should run in real-
time, presenting events that occur sufficiently fast to challenge the player's reactions. Lower
frame-rates look bad, reduce the opportunity for interaction, increase player frustration, and do
not reflect the speed of events in the real world. With this firmly in mind, we set out to design a
framework for the general execution of AI code.
Latter stages of a game project involve optimising parts of game code for processing time
reductions. This includes AI code, which, depending on the type of game, can take up more or
less of the available CPU time. Given this, an important requirement for general AI execution is
that (a) it conforms to the timing constraint of the overall game frame rate. A consequence of
(a) is that the AI never exceeds a maximum per-frame processing time.
AI requires the execution of arbitrarily complex and heterogeneous pieces of code, often
grouped together as behavioural "rules" or "behavioursets" for various game objects or agents,
such as the AI code for a trapdoor, obstacle, spring, or the code for an adversary, racing vehicle
or character. Therefore, a further requirement for general AI execution is that (b) it makes no
assumptions about the exact nature of the AI code, including assumptions about how long the
code will take to execute.
Rendering code normally has to execute every frame in order to construct the visual scene. The
situation is different for AI code. Consider a soccer player, who may need to check for passing
and shooting opportunities every frame, but only need check its position against the team's
formation every other frame, or only in a dead-ball situation. AI code involves a wide range of
execution frequencies compared to non-AI game code. If all AI code is fully executed every
frame when this is not required then the resulting code is inefficient. Also, some games require
different execution frequencies for objects and agents, in addition to controlling the execution
frequencies of their internal processes. For example, a very slow moving tortoise need not be
processed every frame, whereas the hare may need to be. Hence, a further requirement for
general AI execution is (c) it allows different execution frequencies to be specified both for
agents and their constitutive internal processes.
Finally we realised that some AI processes can be extensively time-sliced across many frames,
particularly if the results of the process are not immediately required. For example, if a strategy
game agent needs to plan a route through a terrain, then the planning can potentially take
place over many frames before the agent actually begins to traverse the deduced route. Time
slicing allows computationally expensive processes to be 'smeared' across many frames thereby
reducing the per frame CPU hit. Therefore, a final requirement for general AI execution is (d) it
allows AI processes to be dynamically suspended and reactivated.
There are no general methods for supporting different execution frequencies of parts of AI code
and time-slicing non-urgent AI processes. If these techniques are employed they are employed
in a project-specific, ad-hoc manner. There is no 'AI operating system' that allows programmers
to control these factors. This represents an important missed opportunity for the development
of more complex AI in games. If all AI code were executed through a common AI operating
system or engine, with mechanisms for specifying execution frequencies, upper bounds on CPU
time, time-slicing, and suspension and reactivation of processes, then it would be possible to
get more AI for the same CPU power.
By now you may have realised that (a) asks the impossible: AI algorithms that take the same
amount of time even when asked to do more work. However, games must entertain the player,
not implement a perfect simulation. In the next section we'll look at why we can partially satisfy
requirement (a).
Believability Vs Accuracy
An arcade-style game is somewhat like the real world, consisting of both active and passive
agents and events that unfold over time. But the game need not process every object, agent
and event in the virtual world in order to present a believable, entertaining experience. For
example, if a truck is within the player's field of view when planting a mine then the game
necessarily needs to process the truck movement and the mine drop, and the rendering code
necessarily needs to draw this event to the screen. However, if the truck is 'off-screen' the
rendering code need not be run, and the AI code controlling the truck could simply assert the
existence of a mine on the road at a certain time, rather than processing the fine-grained
movement of the truck. Virtual game worlds need to present a believable world to the player,
and not necessarily present an accurate simulation of the real world. Events not 'interactively
close' to the human player need not be fully processed. Therefore, requirement (a) can be
satisfied if some AI processes need only be "believable" rather than "accurate". These kinds of
processes can be time-sliced over many frames, executed at a lower frequency, be replaced
with computationally less expensive "default" behaviours, or simply postponed. Furthermore,
what may need to be "accurate" at one time may need to be only "believable" at another,
depending on the current activities of the human player. We call the idea of prioritising the
update of parts of the game world currently most relevant to the player "egocentric
processing". Our Process Manager implements this idea.
The Process Manager (PM) meets the requirements (a) to (d) identified above. It is
implemented in C++ and runs on Sony Computer Entertainment's PlayStation2.
The PM is a simple operating system for game AI code. The PM allows users to specify an upper
time limit for AI processing. If the limit is exceeded the PM attempts to alter the timing of
processing to meet the upper limit. The PM provides a common execution framework for active
and passive objects in games. When there is sufficient time in the current frame to complete all
scheduled AI processing, the PM acts as a normal scheduler or "task system". But if there isn't
sufficient time the PM starts to behave very differently from a normal scheduler and begins to
employ the "egocentric" principle. Under high loads it maintains a constant AI time by
postponing or avoiding the less "important" computational work, where "importance" can be
defined by the user (e.g., more "important" if nearer the main camera).
1. Specification of a global per-frame time limit for AI processing, the "AI time". (Requirement
a)
2. Allocation and control of available execution time between agents and AI behaviours
according to user specified priorities, reducing the need for manual optimisation of AI code.
(Requirement a)
3. "Dynamic activation delay" of newly created agents to evenly distribute AI processing over
frames. (Requirement a)
4. Simple method invocation of the list of AI processes as defined by user. (Requirement b)
5. Control of process execution frequency and frame-by-frame interleaving of agents and their
constituent behaviours. (Requirement c)
6. Control of execution order of agents and AI behaviours. (Requirement c)
7. Time-slicing of AI processes. (Requirement d)
Together these mechanisms provide users with a high level of control over the time profile of
their AI code, and also an ability to maintain a constant frame rate despite a variable AI load.
The PM assumes that each agent consists of the following, possibly empty, sets: perception
sets, behavioursets, and action sets. Each perception set contains methods to translate game
state into an internal agent representation, a behaviourset contains methods that create and
modify the agent's internal representations, and an action set translates internal
representations to changes to game state. The decomposition of an agent into sensing, acting
and thinking parts does not impose any restrictions on the type of AI code. For example, a road
cone agent in a racing car game may have an empty perception set, a single behaviourset that
performs physical calculations based on the cone's position and velocity, and a single action set
that updates the velocity and position based on those calculations. The racing car itself,
however, may be more complex. For example, it could have a perception set that updates an
internal representation of the car's position, and the position of competitors; a behaviourset
that contains methods to calculate steering, brake and acceleration inputs based on current
perceptions; and an action set that translates the inputs to update the car's velocity and
position. Behaviours within behavioursets, perceptions within perception sets and actions within
action sets are the indivisible code "atoms" that the PM executes.
The Process Manager is called every game cycle with a single parameter that specifies the AI
time. The PM holds a user-defined AgentList, which is the list of currently active agents. When
invoked the PM executes all the currently active agents and attempts to ensure that this
processing does not exceed AI time.
The PM allocates AI time between active agents, according to the user-specified priority of each
agent. In turn, each agent allocates its share of AI time between its constituent behavioursets
that are active in the current frame, also according to user-specified priorities. The time
allocation process is shown in figure 1.
Figure 1. Allocation of available processor time
The PM's functionality is distributed between the PM class and the agent class. The PM provides
a base class for the PM and agents, which provides the necessary functionality at each level.
Where
In order for the PM to control AI execution, certain execution parameters are required. The
execution parameters required for agents and behavioursets are described in table 1.
Process Peaking
Process peaking occurs when AI processes are scheduled to execute on the same frame at
regular intervals. For example, consider that three behavioursets are initialised at the same
time, with execution periods of 4, 6 and 8. In consequence, every 8 and 12 frames, 2
behavioursets require execution, while every 24 frames 3 behavioursets require execution.
When the PM is working with large numbers of behavioursets, with a diverse range of execution
periods, process peaking will impact on the effort to maintain a constant frame rate.
There are two main methods for minimising the number of processes executing on individual
frames:
(a) Specifying execution periods as prime numbers can reduce process peaking. Compare the
previous example with the following: now the three behavioursets have execution periods of 3,
5 and 7 (all prime numbers). This time there are 3 cases where there are 2 behavioursets
requiring execution occurring at intervals of 15, 21 and 35 frames, while every 105 frames
there are 3 behavioursets requiring execution. Table 2 summarises the number of occurrences
of process peaking under the different execution periods during a 105 frame period.
Table 2 shows that when using prime execution periods process peaking is reduced, despite the
fact that each behaviourset now executes more frequently. The instances of process peaking
with 2 behavioursets reduce marginally, and the occurrences of 3 behaviourset process peaks
are reduced by a factor of 4. The current PM does not incorporate prime periods in its design;
instead, users are simply informed of this useful fact.
(b) Process peaking can be reduced by delayed activation. Agents may have the same
execution period without ever executing in the same frame. This is possible through the
introduction of a mechanism for delaying activation of agents. Take the example of two newly
created agents, each requiring execution every two frames. By delaying the activation of one of
the agents by one frame after agent creation it is possible to interleave the execution of the
agents so that only one executes each frame, thus minimising process peaking. The same
principle can be applied to behavioursets within agents. The PM user can specify activation
delays for behavioursets at compile-time, while the PM itself can handle activation delays for
agents at run-time.
Two possible cases exist for the interaction of execution period and activation delay:
z Some or all of the agents/behavioursets with different activation delays are put
permanently out of phase, and will never all execute simultaneously.
z The processes all execute simultaneously before they would if no activation delays were
specified, but subsequently execute simultaneously at the normal interval predicted by
the lowest common multiple of their execution periods.
The important point is that specifying activation delays cannot increase the period of process
peaking (compared to simultaneous activation), it can only decrease it.
The PM reasons about the density of processes executing within the game, and decides on the
best way to fit new agents and their behavioursets into this environment. The PM uses the
following formula to calculate the number of processes executing in a future frame at time step
t.
Where
P = number of processes,
n = number of agents,
t = time step under examination,
ai = activation time of agent i,
pi = execution period of agent i,
The formula can be modified to consider behavioursets within agents with different activation
delays and execution periods. This formula is used in the pseudo-code in figure 3 to determine
the optimal activation delay for a newly created agent. d is the maximum activation delay
possible for a newly created agent, while l is the lookahead distance over which the number of
process peaks is examined. A reasonable value for d could be 3, while l could be set to the
largest behaviourset execution period within the agent, plus its associated activation delay.
These settings allow newly created agents to have their activation delayed by up to 3 frames,
based on an assessment of future process peak levels including at least one execution of all the
agent's behavioursets. For a more comprehensive search of future process peaks, the
lookahead distance l can be specified as the lowest common multiple of all the agent's
behaviourset execution periods. The lowest common multiple is the maximum length of
repeated process peaking patterns; therefore, specifying this as the lookahead value ensures
that the algorithm considers all potential future process peaks. However, this approach is
computationally expensive, as the lookahead distance may be large, particularly if execution
periods are specified as primes.
The complexity of the algorithm is O(npn) where n is the number of agents under consideration
and pn is the maximum period among all the new agent's behavioursets plus that
behaviourset's associated activation delay. This is a pseudo-polynomial time algorithm.
The parameters for the lookahead algorithm trade minimising process peaking with maximising
the expense of the search. Therefore, the algorithm can be used either with a small lookahead
at run-time for dynamic activation delay, or with full lookahead as an offline optimisation tool
for sets of agents and behavioursets. With dynamic activation delay the PM will avoid
unnecessary process peaking when new agents are added to the AgentList at run-time.
"Level of Detail" AI
Sometimes it is desirable to modify the priorities of agents while a game is running in order to
control the allocation of AI time to different parts of the game AI. For example, in a crowded
city environment, the priorities of pedestrians further away from the camera could be reduced.
Such functionality is necessary to implement the egocentric principle. Reducing the priority of
agents far away from the player is analogous to the graphical technique of using different 'level
of detail' models for rendering according to distance from the camera.
The PM allows run-time modification of agent priorities. It may also be useful to modify
behaviourset priorities at run-time. The PM allows this also. Similarly, it may be necessary to
modify execution periods of agents, and potentially behavioursets. Modifying execution periods
may unbalance the AI computational load resulting in process peaks. Nevertheless, for
flexibility, facilities are provided to modify execution periods for both agents and behavioursets.
AI Granularity
Implementation
Three versions of the testbed were implemented to demonstrate the benefits and costs of using
the PM to execute AI code. The different versions are listed in table 3.
Figure 4 presents performance results for the different testbed versions. Maximum performance
is the number of agents that cause "performance degradation". In the case of non-scheduled
testbed versions, "performance degradation" means dropping from a frame rate of 60 frames
per second. In the case of scheduled testbed versions, "performance degradation" means failing
to completely execute all agents and their behaviours when scheduled to do so. "DAD" means
"dynamic activation delay". Two scheduled versions of the testbed were evaluated, one with no
dynamic activation delay, the other with a maximum dynamic activation delay of 2. The results
show that while the performance hit from using the (non-optimised) PM is around 20%, much
of this can be reclaimed through dynamic activation delay.
More importantly, the scheduled version can give the appearance of handling more than 67
agents by suspending the execution of some AI behaviours. The scheduled DAD version can
process 169 agents before frame drop. In this situation, some of the agent behaviours are
executing less frequently. By changing the priorities of agents at run-time and re-ordering the
AgentList, game programmers can control which agents will lose AI resources first. A fuzzy
upper limit and graceful degradation of AI performance replaces a sharp cut-off point for loss of
frame rate. This is very useful for games with a variable and unpredictable number of agents
that consume a variable and unpredictable amount of CPU time. This means that the need for
manual optimisation of AI code is reduced. If your game allows "believable" rather than
"accurate" AI processing then the limiting factor becomes drawing time rather than CPU time.
You just might be able to satisfy that design brief after all.
Careful readers will note that the use of the process manager will require programmers to
specify extra parameters for their code, such as whether code fragments may be interruptible,
how much share of CPU time should be allocated to code fragments, and so forth. This can
represent an increase in AI code development time. However, the important point is that time
spent specifying process manager parameters will save time in the area of AI optimisation: due
to the fuzzy upper time limit on AI processing per frame the point when hand optimisation of AI
code is required is postponed. We anticipate that use of the process manager will save time
overall, as hand optimisation of all types of code, not just AI, is a time consuming process.
AI Hardware Acceleration?
As stated, the process manager makes no assumptions about the precise nature of the AI code.
The process manager is identical to an operating system in the sense that it can run multiple
different threads performing different kinds of computational work, and imposes no restrictions
on the type of code that may be run. However, there is an issue of code granularity. Some AI
mechanisms are more amenable to time-slicing than others. For example, we are working on a
rule-based, declarative language to sit on top of the process manager. Our reasoning is that
rule-based languages are much easier to interrupt, and exhibit finer execution granularity than
their procedural counterparts. The higher the code granularity the better the process manager
can allocate time. This is an issue of performance rather than competence. The process
manager really can run any kind of code, whether AI or not, or whether the code content is
evolutionary computation, finite state machines, simple flocking algorithms or any other
mechanism for that matter. However, the granularity of that code can affect process manager
performance.
But this is just a start. Chips that truly accelerate AI processing will need to deal with two main
themes.
First, game AI code generally exhibits a coarse-grained parallelism at the agent level.
Characters in a game "sense" the current game state, perform some "internal" processing and
then "act" upon the game state to produce a new game state. The internal processing of each
agent could occur in parallel without requiring significant changes to AI logic. Future AI
accelerators could impart huge processing speed-ups with this kind of parallelisation. Note that
this isn't fine-grained parallelism at the instruction level, but a coarse-grained parallelism at the
agent level.
Second, game AI code is essentially rule-based. In the abstract, AI code maps conditions on
game state to changes to game state. In procedural languages, such as C and C++, the
mapping is implemented as "if then" rules acting on data structures. Despite claims to the
contrary, AI code is very ordinary in this sense: it is like other code, except it usually exhibits a
higher density of condition tests that tend to predicate on a highly dynamic game state. AI
hardware accelerators need to directly address the rule-based nature of AI code. Progress in
this area is more difficult, as it requires developing and using new real-time, rule-based AI
languages while simultaneously designing hardware that speeds the execution of such
languages.
Ian Wright began creating games in the early 80's at the age of 14, developing two
hit games for the ZX Spectrum home computer. He received a PhD in Artificial
Intelligence from the University of Birmingham in 1997. He now develops AI
technologies for PS2 at Sony Computer Entertainment Europe's "Team Soho"
development studio. Ian can be reached at
http://www.gamasutra.com/features/20000619/mailto;[email protected].
James Marshall came from a background in computer science, via artificial life
research and part-time commercial game development on the Amiga, to his present
position at Sony Computer Entertainment Europe's "Team Soho" development studio.
He currently works on core AI technologies for PlayStation 2 games. James can be
reached at
http://www.gamasutra.com/features/20000619/mailto;[email protected],
unless he's climbing mountains.
// scheduling
determine active agents this frame
calculate time allocation for each active agent
// execution
for each active agent on list
// scheduling
determine active behavioursets this frame
calculate time allocation for each active behaviourset
// execution
for each active behaviourset on list
if behaviourset was terminated on last execution
if behaviourset is suspendable
restart behaviourset execution
else if behaviourset is sliceable
continue behaviourset execution
end if
else
run behaviourset
end if
check time
if no time remaining exit behaviourset execution
end for
check time
if no time remaining exit PM
end for
Remember Quake? Back when it was first released, consumer-level 3D acceleration was nearly
unheard of, and id's software renderer scaled in speed with the clock speed of your Pentium
processor.
During the few years since then, though, our market has reached a point of stratification with
non-accelerated Pentium "Classic" machines on the low end and the latest and greatest pixel
crunchers on the high end. The range is enormous. As game developers, it's important to
support high-end consumers, and yet we'd prefer not to abandon the low-end players. From
this desire was a new industry trend born: scalable geometry.
Scalable Geometry
Scalable geometry is any kind of geometry that can be adapted to run either faster with
decreased visual quality or slower with increased visual quality. There are a number of ways of
doing this, so we'll briefly cover the more popular methods.
One of the earliest methods used in games to scale geometry involved hand-generated level-of-
detail models. You can see this principle at work in games like Battlezone and Grand Prix
Legends. In the case of a race car, artists create a very high-detail model of the car, then a
lower-detail model, and then continue down to a very low-detail model. Then, at run time,
factors like the speed of the machine and the distance of the car to the viewer determine which
model you use each frame. One of the benefits that hand-tuned LOD models have over other
approaches is that the models can have more actual polygonal detail at the higher levels, since
they're created by hand. There are many drawbacks, though. For instance, the switch from one
model to another can manifest itself as an abrupt visual "popping", and can therefore be
distracting to the viewer. A solution to this is to increase the number of LOD models, but this
exposes another drawback: it takes a lot of an artist's time to make several versions of every
object.
The final method we'll mention, then, is the topic of this article. Curved surfaces are one of the
most popular ways of implementing scalable geometry. There is a good reason for that, too; in
games we've seen them in, they look fantastic. Unreal's characters looked smooth whether they
were a hundred yards away, or coming down on top of you. Quake 3: Arena screenshots show
organic levels with stunning smooth, curved walls and tubes. There are a number of benefits to
using curved surfaces. Implementations can be very fast, and the space required to store the
curved surfaces is generally much smaller than the space required to store either a number of
LOD models or a very high-detail model.
The downside of curves and curved surfaces is that they are perhaps the most difficult of the
three methods to learn and understand. There's a lot of reference material out there, but a lot
of it is not easy reading, even if you know the material and are just using the books for
reference. Therefore, in this article, we'll look at the basics of curves and curved surfaces. We'll
cover the concept of the basic polynomial curve, and then onto two example curve
representations: Hermite curves and Bézier curves. From there, we'll move onto surfaces,
covering the Bézier patch. In this article, we'll take the most straightforward approach possible
to rendering the curves and patches. While this does mean that our implementations will be
very slow, they will hopefully be more legible for it. Next month, we'll continue our examination
of patches by delving into optimization techniques to make them truly useful.
Just to be absolutely sure we all start off on the same wavelength, we'll start by reviewing
some of the basic math principles that we need as a foundation for working with curves and
curved surfaces. Feel free to skip this section if this is remedial.
At its core, any of the curves we'll discuss can be represented as a parametric polynomial
function. Following convention, we'll use the parameter u. Our curves will look something like
this:
Generally, we'll refer to f(u), which is the 3D point on the curve at u. Now, as long as at least
one of c0, c4, and c8 are non-zero, the curve will be a cubic curve, and cubic curves are the
ones we're most interested in. After all, since we'd like to keep computation to a minimum,
we'd like to use the lowest-degree curve possible (since a higher degree requires more
multiplication every time it's evaluated). So, we might try using a zero-degree curve (which
would be fast to compute). But a zero-degree curve is simply a point, which doesn't do us too
much good.
Moving on, a one-dimensional curve is simply a line. It's pretty clear that lines are insufficient
for our purposes. So, we move on to quadratic curves. These are parabolas, which might seem
sufficient for representing curves and curved surfaces. Unfortunately, second-degree curves will
always lie in a plane, and we're working in three dimensions, so it would be better to have a
space curve, a curve that isn't confined to two dimensions or less. Therefore, our cubic curve is
the curve of choice.
Representations
So all a programmer needs to do is code up a quick tool for the artists consisting of a view
window and four text entry boxes for them to type in the coefficients of curves, right? Of course
not - artists demand flexibile, intuitive tools, and it's clear that creating curves by typing in
coefficients lacks that certain ease-of-use factor for most of us. Therefore, we need another
representation for the curve that makes creation and manipulation more intuitive. We'll touch
on two such representations, the Hermite curve and Bézier curve.
The Hermite curve we cover both because it's fairly common, but also because it doesn't
require any specialized formulae to understand it. Then, as Bézier curves are somewhat more
versatile, we'll move on to them. While we won't discuss it here, converting from Bézier curves
to Hermite curves and vice versa is very straightforward and is explained in the references at
the end of the article.
There are plenty of other curve representations that we aren't going to touch upon. Notably, we
are not going to cover B-Splines or that family (including the pervasive NURBS), of which Bézier
curves are simply a special case. I chose the Hermite and Bézier curve models as a good
starting point, because they can be represented and understood with a fair degree of ease.
Once you have a firm grasp on Bézier curves, picking up one of the references at the end of this
article and learning more about other curve models is much easier.
Hermite Curves
The question, then, is how we get the cubic equation from the
points and vectors. Hermite curves are nice this way, as the
derivation of the cubic is possible with just a little calculus. Let's
say our cubic equation is
Figure 1. A Hermite Then, we can express the endpoints as f(0) and f(1), and the
curve. tangents as f'(0) and f'(1).
Tangent vectors are
magenta, endpoints are
red, and the curve itself
is blue.
Eq. 1
Then we'll solve for what are called the "basis functions." The basis functions are simply
functions of u that determine the contribution of the endpoints and tangents along the curve.
So, for instance, the basis function that corresponds to p0 determines how much p0 contributes
to points along the curve. Just by rearranging terms once again, we have the basis functions.
Then, we can express the curve as the sum of the basis functions times the components:
This provides us a handy way of expressing the curve. Furthermore, basis functions become far
more important when we discuss Bézier curves, and so the Hermite curve provides a good
introduction to the idea of a basis function.
So, as we see here, a basis function is nothing more than a function associated with a
component of the curve that determines the contribution of that component to points along the
curve.
As handy as the basis functions are for expressing the curve, it's easier for our naïve
implementation just to calculate the cubic equation of the curve by finding the coefficients using
Equation 1. The code that does this is shown in Listing 1.
Then, we just run along the curve by starting u at 0 and incrementing it by some fixed amount
until we reach 1. We evaluate the curve at each value of u, save each result as a point on the
curve, and then render the curve as a line strip. The code to evaluate the curve at a given value
of u is quite simple and is shown in Listing 2.
It's worth noting that even though the curve is recalculated fairly slowly every frame, the frame
rate is still in the high hundreds (well, on my Voodoo2 graphics card, at least). Since we're
doing nothing but calculating a hundred or so points along a curve every frame, the speed hit
as a result of this inefficiency is not yet apparent.
Bézier Curves
Our problem now is that it's not immediately clear how we define
the curve based on these four points. With Hermite curves, we
could use some basic calculus to get a cubic parametric equation.
But even if we say that p0 and p3 are the endpoints, the points p1
and p2 seem to have little bearing, analytically, on the curve. It's
easy enough to say that the curve should "bend towards" the
points, but what does that give us in terms of our cubic equation? Figure 2. Acubic Bézier
Here's where the importance of our basis functions comes in. We curve. Its control points
need to find a set of functions that blend the control points are red, and the curve is
together in ways that give us the curve that we want. blue.
To do that, of course, we need to define the properties we'd like the curve to have. We can
summarize these with three qualities:
1. We'd like the curve to interpolate the endpoints. That is, we'd like the curve to start at p0
and end at p3. That makes curve creation more intuitive.
2. We'd like the control points to have local control. That is, we'd like the curve near a
control point to move when we move that control point, but have the rest of the curve not
move as much. Again, this gives us better intuitive control when crafting a curve.
3. We'd like the curve to stay within the convex hull of the control points so we can cull
against it quickly if we're doing visibility culling or hit testing.
Luckily for us, there exists just such a set of functions. These functions are called the Bernstein
basis functions, and are defined as follows:
The parenthesized block with the n and the i is the mathematical phrasing of the binomial
coefficient normally phrased "n choose i" or "n nCr i". The formula for n choose i is:
If we were considering general Bézier curves, we'd have to calculate that. Since we're only
considering cubic curves, though, n = 3, and i is in the range [0,3]. Then, we further note that
n choose i is the ith element of the nth row of Pascal's triangle, and so we have our values,
{1,3,3,1}. So we can just hard-code that, no computation necessary.
As for the local control, we can convince ourselves that this holds by staring at the basis
functions for long enough. It's obvious that p0 and p3 have local control, because as we move
them, the curve moves, and they have very little influence over the rest of the curve. We can
also see, then, that p1 and p2 have local control, since they have the most influence over the
curve 1/3 of the way and 2/3 of the way along the curve, respectively. That means that if we
moved p1, it would pull the section 1/3 of the way along the curve with it, and affect the rest of
the curve much less.
Then, we have our final condition: the curve must remain within the convex hull of the control
points. With the Bernstein basis functions, this is true. The proof, however, is fairly complicated,
and ends up dragging a bevy of new concepts into the fray. For the interested, Farin does a
reasonable job of explaining this. It has to do with the fact that the Bernstein basis functions
are nonnegative for u in the range [0,1], and also that if you sum up the values of all the basis
functions for any value of u, the result is always 1.
Then, the formula for calculating a point on our Bézier curve is:
Eq. 2
Our approach to rendering a Bézier curve is similar to that for rendering Hermite curves. We
find a series of points along the curve, and render that series as a line strip. We'll do it, once
again, by evaluating the curve at even intervals of u. Listing 3 shows this clearly:
UniformCurveTessellator::tessellate takes a vector of four control points and a vector of four
associated basis functions, and renders the curve in 100 steps.
To generate each point, it calculates Equation 2 for the input - it adds up the sum of each point
times that point's basis function. For our cubic curve, this is certainly not the most optimized
way to calculate the curve. However, because it's only 100 points, it's not noticeable and the
demo still runs quite fast.
It might seem more consistent to cover not only Bézier patches but also Hermite patches, as
well. The reason we're skipping straight to Bézier patches is that we're trying to cover the
curves and curved surfaces in the most intuitive order possible. Whereas it makes sense to
cover Hermite curves and then Bézier curves, Hermite patches are somewhat more difficult to
learn than Bézier patches.
Eq. 3
We can see by inspection that our properties from the Bézier curve extend to the patches. Why?
For the following reasons.
Rendering a Bézier patch is more complicated than rendering a Bézier curve, even when doing
it in the simplest possible way. With a Bézier curve, we could just evaluate a number of points
and render a line strip. With a patch, we need to evaluate strips of points and render triangle
strips. Also, with a patch we have to worry about lighting. After all, an unlit patch will just look
like an oddly-shaped splotch of red on the screen. To see the contours, we need lighting. For
our naïve implementation, that means we'll need to light each vertex. To light a vertex, we
need its normal. So, for every (u,v) pair, we need to solve for the point on the surface, and
then solve for its normal.
Equation 3 tells us how to find the point on the surface, but how do we find the normal? Well,
we know we can take the derivative of the surface with respect to either u or v, which would
yield the tangent vectors to the surface in the direction of either u or v, respectively. If we find
both of those tangents, we know that they both lie in the plane tangent to the surface. Then,
taking their cross product will yield a mutually perpendicular vector, the surface normal. Finally,
we'll have to normalize it since it most likely won't be unit length.
So, how do we find df(u,v)/du and df(u,v)/dv? As it turns out, we can just take the derivatives
of the basis functions. That is,
Eq. 4
The same holds for the derivative with respect to v. Therefore, before rendering, we calculate
the derivatives of the basis functions and store them. We use Equation 4 and its v analogue to
find the tangents, and then proceed to find the surface normal. The code for the loop is shown
in Listing 4.
Now, while the curves didn't slow down from our naive implementations, this patch demo shows
quite painfully why optimization is very necessary. It runs at a steady 30 or so frames per
second (again, on my Voodoo2), but that's just one patch. If you tried to base a terrain system
on this implementation, it would be painfully slow. After all, consider the work we're doing. By
default, the tessellator breaks the surface into 100 points. At each point, we're evaluating 32
cubic functions and 32 quadratic functions, then doing a vector cross-product and a vector
normalization (ouch!). Then, for each point, we're asking OpenGL to light it, which is not cheap
either. Plus, we're not caching any of this between frames, and we're actually allocating and
then deallocating the space every frame. So we're doing a lot of work, much of it entirely
unnecessary.
Nonetheless, it works. We're rendering a lit Bézier patch, and even if it is a bit sluggish, it looks
pretty good. Now, if only we could do something with it...
There are certainly a number of loose ends. We've covered Bézier and Hermite curves and
Bézier patches, but the implementations so far are entirely unoptimized and the patch demo is
rather sluggish even for what little it is supposed to do.
Furthermore, we haven't seen an example of using these things in a real application. The demo
code is just that - a demo of a curve or surface floating in black space. There is still a fair
amount of material to cover before we can turn these into something real.
Next month, I'll cover some optimization techniques for Bézier curves and surfaces. We'll also
see how to form other surfaces and objects by joining Bézier patches together, and look at
some of the properties of such objects, as well as some of the problems that can arise from the
new techniques. Finally, having covered all of this, I'll finish off the article with a far more
interesting demo.
When he's not sitting around radiating potential, Brian's probably busy furthering the
secret OpenGL agenda. Either that, or he's likely doing the same thing he does every
night, Pinky - trying to take over the world. Send preemptive bribes and/or tribute to
[email protected].
References
z Farin, Gerald. Curves and Surfaces for CAGD, A Practical Guide. New York: Academic
Press, 1997.
z Garland, Michael and Paul Heckbert. "Surface Simplification Using Quadric Error Metrics."
Proceedings of SIGGRAPH (1997): pp. 209-216.
z Mortenson, Michael E. Geometric Modeling. New York: Wiley Computer Publishing, 1997.
z Watt, Alan and Mark Watt. Advanced Animation and Rendering Techniques: Theory and
Practice. New York: ACM Press, 1992.
z The full source to the Hermite curve demo, Bézier curve demo, and Bézier patch demo
are available from my web site at http://www.maniacal.org/gdc.html
void genCubicFunction()
{
// Do this so we can treat each endpoint and tangent //vector as a separate array.
float* p0 = points;
float* p1 = points + 3;
float* v0 = tangents;
float* v1 = tangents + 3;
// Do this so we can treat each vector coefficient //of the function as a separate array.
float* a = functionCoeffs;
float* b = functionCoeffs + 3;
float* c = functionCoeffs + 6;
float* d = functionCoeffs + 9;
// Now, generate each coefficient from the //endpoints, tangents, and the predefined
basis //functions.
//Note that we loop once each for the x, y, and z //components of the vector function.
for (int lcv = 0; lcv < 3; lcv++)
{
// a = 2p0 - 2p1 + v0 + v1
a[ lcv ] = (p0[ lcv ] + p0[ lcv ]) - (p1[ lcv ] + p1[ lcv ]) + v0[ lcv ] + v1[ lcv ];
// b = -3p0 + 3p1 - 2v0 - v1
b[ lcv ] = - (p0[ lcv ] + p0[ lcv ] + p0[ lcv ]) + (p1[ lcv ] + p1[ lcv ] + p1[ lcv ])
- (v0[ lcv ] + v0[ lcv ]) - v1[ lcv ];
// c = v0
c[ lcv ] = v0[ lcv ];
// d = p0
d[ lcv ] = p0[ lcv ];
}
}
// This function simply computes au^3 + bu^2 + cu + d //for a specific u and stores the vector
result in out.
void evaluateAt(float u, float* out)
{
// Do this so we can treat each vector coefficient // of the function as a separate array.
float* a = functionCoeffs;
float* b = functionCoeffs + 3;
float* c = functionCoeffs + 6;
float* d = functionCoeffs + 9;
// Note that we use Horner's rule for computing // polynomials (which is the way we nest
the
// multiplies and adds to minimize the computation // we need.)
out[ 0 ] = ( ( ( a[ 0 ] ) * u + b[ 0 ] ) * u + c[ 0 ]) * u + d[ 0 ];
out[ 1 ] = ( ( ( a[ 1 ] ) * u + b[ 1 ] ) * u + c[ 1 ]) * u + d[ 1 ];
out[ 2 ] = ( ( ( a[ 2 ] ) * u + b[ 2 ] ) * u + c[ 2 ]) * u + d[ 2 ];
}
Listing 3. Code that tessellates and renders a Bézier curve as an evenly-spaced series
of line segments.
// We can multiply by this in our loop instead of // dividing by (numSteps-1) every time.
double invTotalSteps = 1.0f / (numSteps - 1);
// This holds the point we're working on as we // add control points' contributions to
it.
float curPt[ 3 ] = { 0, 0, 0 };
// Generate a point on the curve for this step.
for ( int pt = 0; pt <= 3 ; pt++ )
{
// Get the value of this basis function at // the current parameter value.
float basisVal = bases[ pt ]( u );
::glEnd();
}
Listing 4. Code that tessellates and renders a Bézier patch as an evenly-spaced grid of
triangles.
This may come as a shock to some, but the world is not made up of corridors composed of
completely planar surfaces. We live in a wildly organic place. Hills roll, muscles bulge and
fountains splash. The world around you is filled with organic shapes which cannot easily be
created out of triangles.
In fact, many of these objects are not even just lying around looking all organic. They slop,
splash, waddle, and plop about you all the time. Many shapes around you are even in motion.
These objects change shape effortlessly as you game artists crumple under the pressure of
having to model such phenomena. When was the last time you saw a nice splashing fountain in
a game, anyway?
Animators have faced the challenge of visually creating the organic world we live in for some
time now. To help them out, commercial modeling packages have provided the artist with tools
for creating organic shapes. One of the methods for creating organic objects is through the use
of blobby balls that can be combined together to form a clay-like sculpture. The commercial
animation package developers have realized the usefulness of this technique and coined all
sorts of proprietary terms for their version. You may have seen ads for meta-balls, meta-clay,
blob-modeling, and various other ways of combining the term "meta" with some form of goop.
To create an object from this meta-goop, an artist drags around primitive elements, usually
spheres, which represent the rough shape of the object. Each of these elements has a center
position and several parameters associated with it. These parameters define how the element
will interact with the particles and world surrounding it. You can see an example structure for a
meta-goop particle in Listing 1.
The position describes the center of the element. I also need to keep track of the radius of
influence of the element (actually squared so I save some math later) and the strength of the
element. This strength parameter defines how the element will affect the space surrounding it.
By running this formula over all the elements in your system, you get the exact field strength
for that position. The energy field creates some interesting data but is not much of an object.
What I want to create are particles that will visually grow together as they get closer. You can
see an example in Figure 1. In order to create an object that will show this visual aspect of the
energy field, it is necessary to define a value that will represent the outer shell of the object -
the threshold.
The energy field varies in strength from zero on up at any position you may evaluate. In fact,
there is nothing to keep you from defining negative strength for an element, creating negative
regions, or holes, in the energy field. This is useful for effects such as denting and the like. To
define the surface of the object in the field, I can set an arbitrary threshold giving the object its
final shape.
Walking on Eggshells
Figure 2. Creating a By creating a few meta-goop particles and setting some values
boundary threshold in a for them, I have created my meta-goop system. Run that goop
2D energy field. through a function that evaluates the energy field, apply a
surface threshold, and I have the surface shell for the meta-goop
object defined. But the problem remains, how do I draw it?
I could step across the entire 3D space defined by entity radii and evaluate the field. Anywhere
the returned value is equal to the threshold, I could draw a solid cube the size of the steps
taken. This sounds pretty good. Sounds like it would work. Actually, it sounds kind of familiar.
It sounds kind of like volume rendering of voxels for applications such as viewing CAT scan
data. In fact, that is exactly what I would be doing if I took this approach.
However, rendering the energy field this way can lead to pretty chunky looking images unless
the step size is fairly small. This is because the energy field is continuous over the entire range
of the model world. However, the steps I took walking across the field are in discrete steps. If
the steps are too big, the image can look chunky. This is analogous to drawing a line on a
computer graphics screen. If the resolution of the screen is too low, the line can look very
jagged. This unfortunate condition is known as "the jaggies" and requires some form of
smoothing or antialiasing to make the lines look better.
Unfortunately, decreasing the step size in my energy field will greatly increase the amount of
calculations that must be made. Therefore, it is necessary to find a way to smooth out the voxel
image - sort of antialias the meta-surface.
Fortunately for me, the graphic visualization and medical imaging industries have been dealing
with this issue for quite some time. Wyvill and McPheeters in 1986 and Lorenson and Cline in
1987 independently developed a system called "marching cubes" which enables you to render a
polygonal approximation of a voxel field. One possible unfortunate circumstance is that this
algorithm may be tainted by a software patent and I am investigating how this will affect the
issue (see the section titled The Marching Cubes Patent Question at the end of this article).
Goopy Games
I hope it is now clear that these meta-goop techniques can be used to create interesting organic
objects suitable for real-time display. However, there are several aspects that actually make
them ideal for use in games. For one, they are procedurally created. Complex structures can be
generated from simple data structures consisting of the location and attributes of each particle
in the system. There is no need to store a complete mesh.
In addition, the meta-object can be tessellated to different levels depending on the initial grid
size of the voxel space. This gives the game a dynamic level-of-detail component that is needed
in these days of varying hardware performance.
You can attempt generation of the objects in real time through efficient optimization of the
surface approximation routine. You could also simply decide to create the objects at load time
and display them as traditional polygonal objects during the actual game, or evaluate the mesh
only when the state of the goop elements changes. This kind of flexibility makes for easy
integration into a variety of applications.
I didn't even discuss how the surfaces could be rendered. One obvious choice would be to apply
environment-mapping techniques to create the chrome creature from Terminator 2. Likewise,
you could apply bump-mapping techniques to bring a water creature to life. I think an
interesting application would be to combine meta-surface techniques to a particle system like
the one I described last summer ("Spray in Your Face," Game Developer, July 1998).
For more fun, get my demo application off the Game Developer web site
(http://www.gdmag.com/). This will allow you to play with the creation of meta-goop and start
spreading some slop around your games.
As many of you who have met me and heard me rant on the topic know, I believe algorithmic
software patents are totally wrong. I feel they completely halt continued development down
interesting research pathways by shrouding a topic with legal pitfalls. Graphics researchers
create progress by building on the work done by others before them. I like to imagine the state
of the industry if Bresenham had patented his method for drawing a line on a graphic display
and then charged a licensing fee for every line drawn.
The topic of volume rendering is an interesting case in point. As an obvious next step in the
visualization of volume data, it was reported by researchers in several publications. However,
General Electric apparently owns a patent on the technique via the Lorenson and Cline
implementation (U.S. patent #4,710,876). As an actual apparatus to display medical imaging
data, I can understand it. However, the patenting of a "method for displaying three-dimensional
surface images" seems pretty broad to me.
I have been told by someone via e-mail that GE aggressively enforces this patent. However, it
is not clear to me how this would apply to the rendering of an isosurface in a game. Does this
mean that any modeling program using these techniques must pay a license to GE? If I create a
game using a derivative of marching cubes and it is a big hit, am I going to receive a stealth
patent letter in the mail demanding a percentage? How derivative does it need to be? The prior
art on this topic seems limitless, but what can I use as a reference and still be safe?
With the record number of software patents being filed, this is going to become an increasingly
difficult issue for game developers in the future. I am actively researching the issue and hope to
report on the results in a later column. Anyone with information on the topic, please let me
know. In the meantime, always document your research from public journals as best you can.
Ignorance is not bliss in this situation.
Greene, Ned. "Voxel Space Automata: Modeling with Stochastic Growth Processes in Voxel
Space (Proceedings of Siggraph 89)." Computer Graphics, Vol. 23, No. 4 (Aug. 1989): pp. 175-
184.
Lorensen, William, and Harvey Cline. "Marching Cubes: A High Resolution 3D Surface
Construction Algorithm (Proceedings of Siggraph 87)." Computer Graphics Vol. 21, No. 4 (Aug.
1987): pp. 163-169.
Watt, Alan, and Mark Watt. Advanced Animation and Rendering Techniques. Reading, Mass:
Addison-Wesley, 1993.
Wyvill, Geoff, Craig McPheeters, and Brian Wyvill. "Data Structure for Soft Objects." The Visual
Computer Vol. 2, No. 4 (Aug. 1986): pp. 227-234.
Web Resources
http://www.students.cs.ruu.nl/people/jedik/Methods/Surface_
fitting/Marching_cubes.htm
http://www.swin.edu.au/astronomy/pbourke/modelling
When not splashing gloop around his kitchen floor, Jeff can be found creating real-
time graphics applications at Darwin 3D. Fling some bits of your own his way at
[email protected].
typedel tMetaGoop
{
Vector position;
float radiusSquared;
float strength;
};
I am generally pretty disciplined about working normal hours during the week. However, on
Fridays I like to shoot pool and eat pizza at the local sports bar. Since happy hour starts at
three, I sometimes need to move work down the street. I was working out how to win a serious
game of nine ball when one of those geeky discussions broke out about pool table physics. Pool,
like many sports, is dominated by the laws of physics. Good players have an excellent sense of
the application of force, the physics of collisions, and the influence of friction on objects in
motion.
Last month I described how friction could be used to increase the realism of the physics model
in real-time games. The demo program made it possible to see how various coefficients affected
a mass-and-spring model. However, it wasn't very much fun. In order to demonstrate how a
solid physical foundation can actually create interesting game play, I need to pull some of these
concepts together into a real application. A pool table simulation is a natural choice. It will allow
me to apply many of the techniques I have covered as well as provide some ideas that can be
converted easily to other sports such as golf or tennis.
The first step is to break the initial velocity of ball A into its components along the line of
collision, n, and the tangent to the collision, t.
The impulsive force acting during the collision is directed along the line of collision. Therefore,
the t component of the velocity of each ball is not changed.
In order to determine the new velocity along the line of collision, I need to look at the impulsive
force between the bodies. The impulse acts on both bodies at the same time. You may
remember Newton's third law of motion, the forces exerted by two particles on each other are
equal in magnitude and opposite in direction.
Since the impulse forces are equal and opposite, momentum is therefore conserved before and
after the collision. Remember that the momentum of a rigid body is mass times velocity (mv).
(Eq. 1)
This equation can't be solved without some more information. In my previous article "Collision
Response: Bouncy, Trouncy, Fun", I discussed the coefficient of restitution. This is the scalar
value between 0 and 1 relating the velocities of bodies before and after a collision via the
formula:
(Eq. 2)
For this example, I'm using a coefficient of restitution e of 0.8. I can use this formula to create
a second equation.
(Eq. 3)
Solving Equations 1 and 3, I get the velocities of the two billiard balls after the collision.
In order to solve this problem in the simulation, I need to derive the impulse force directly. The
impulse force creates a change in momentum of the two bodies with the following relationship.
(Eq. 4)
These formulas can be combined with Equation 2 to determine the impulse force given the
relative velocity and the coefficient of restitution.
(Eq. 5)
You can plug Equation 5 back into the example problem and make sure it works. Remember
that because of Newton's third law, the impulse is equal and opposite for the two colliding
bodies. When you apply Equation 5 to the B ball, remember to negate it.
Those of you who read Chris Hecker's column on collision response ("Physics, Part 3: Collision
Response," Behind the Screen, Game Developer, February/March 1997) will recognize Equation
4 as the impulse equation for a general body that does not rotate. When we do not consider the
rotation of the billiard balls, they behave exactly like the particles used in my March 1999 mass-
and-spring demo. My suspicion was correct, and I can use the particle dynamics system as a
base for the demo.
For many applications, this would probably be more than enough to get a decent physical
simulation. In fact, I imagine many pool simulations end right there. This level of simulation is
probably sufficient for other games, such as pinball. However, anyone who has played much
pool knows that this is not the end of the story. The rotation of the ball caused by the reaction
with the table makes a tremendous difference in the realism of the simulation.
When a billiard ball is hit with the cue stick, the ball starts moving across the table. If the ball is
struck along its center of mass, the ball is not initially rotating.
However, soon the ball starts rolling along. Friction between the ball and the table causes this
roll to occur. You can see this situation in Figure 2.
The ball is traveling with the forward velocity, v. In my article "The Trials and Tribulations of
Tribology" , I discussed the use of kinetic friction via the Coulomb dry friction model. For our
purposes, I'm going to call this force "sliding friction." The force of friction applied to a body
sliding over a surface is given by the following formula:
(Eq. 6)
The friction force is applied in the direction opposite the velocity. Since this force is applied to
the surface of the ball and not its center of mass, the frictional force causes angular
acceleration in the ball. As the ball rolls across the table, the angular velocity increases because
of this sliding friction force. This continues until a time of equilibrium is reached, where the
velocity of the point contacting the table equals the velocity of the center of mass. At this time,
the ball is no longer sliding and is now rolling on the table. This situation is called a natural roll
or rolling without sliding. In mathematical terms, this situation happens when
(Eq. 7)
where v is the velocity of the ball, w (LUKE-this is a lowercase omega) is the angular velocity of
the ball, and R is the ball's radius.
Now I need to show how the angular acceleration actually changes. This is going to mean
bringing up another term, the inertia tensor, or I. You may remember from Chris Hecker's
column on 3D physics ("Physics, Part 4: The Third Dimension," Behind the Screen, June 1997)
that the inertia tensor relates the angular velocity of a body to the angular momentum of that
body. For arbitrarily complex objects, creating the inertia tensor can be quite difficult. However,
for a uniform sphere where the density is uniform across the sphere, it's quite easy. The inertia
tensor for a sphere is
(Eq. 8)
Therefore, the product of this matrix with any vector is a simple scaling of that vector. The
relationship between the angular acceleration and the friction force then becomes
(Eq. 9)
If I now take a look at the problem in Figure 2, I can calculate how long it will take for the ball
to achieve natural roll given an initial velocity v. From the principle of impulse and momentum,
I know some information about the linear momentum and angular velocity of the ball at a later
time.
In other words, the momentum at some later time is the initial momentum minus the impulse
created by the friction force, f. I know the friction force from Equation 6.
At the point of natural roll, I know the state of equilibrium between angular and linear velocity
from Equation 7.
So you can see that as a result of the friction force of the table, a sliding billiard ball will always
reach a point where it is rolling without sliding on the table. This is the type of realism I want to
have in the simulation. A ball when struck should slide across the table, slowly settling to a
state where it is rolling without slipping.
One glaring problem remains. I can run the simulation with all of the physics discussed so far.
When struck hard, a billiard ball will slide and then roll. Once the ball has reached this natural
roll, there is nothing in my simulation that will keep it from continuing to roll forever. The
friction force is gone since the point of contact is not moving relative to the table. I need to add
another force that will slow down a rolling ball. I can add another frictional force, called rolling
friction, which is applied when the ball is in natural roll. The form of rolling friction is
It is applied exactly like the sliding friction whenever the natural roll conditions apply. It is
important to note that the coefficients of rolling and sliding friction are not necessarily the
same. Think of a ball moving on a rubber surface. The coefficient of sliding friction would be
very high. However, the rolling friction would be comparatively low, allowing the ball to roll
across the surface easily.
Collision with the table's side cushions can be handled in a couple of ways. If I consider the
table to be completely 3D, I will need to handle 3D collision between the ball and the cushion.
That would be the most realistic. It would allow the balls to move up and down as well as side
to side. This might be interesting if I wanted to be able to perform a jump shot (when the cue
ball jumps up and over other balls on the table). However, I'm not really ready to tackle the
physical and interface issues involved in making this happen.
If I'm willing to give up the flexibility of allowing the balls to move in 3D, things become a bit
easier. For one thing, I can eliminate the gravitational force acting on the ball. A ball sitting on
a table is in a constant collision battle with the table top. By getting rid of the gravitational
force, I save having to deal with the ball constantly interpenetrating the table. I still need to
keep track of the gravitational force as it is used in calculating the friction force applied by the
table. However, I just assume the balls are in constant sliding or rolling contact with the table.
Also, if I ignore vertical motion of the balls, I can turn the collision with the side cushions into a
2D computational geometry problem. The boundaries of the table are now line segments and I
can use the 2D collision detection routines developed in my article "Crashing into the New
Year".
Later, I may wish to allow the balls to jump. It would then be easy enough to convert the
collision back to 3D. These kinds of decisions are made all the time during the game production
process. Since game simulation is all about speed versus realism, simplifying the problem if it
works for your particular application makes sense.
Rack 'Em Up
Using these techniques, I have created a demonstration of a simple pool table. The simulation
uses rolling and sliding friction to simulate the way a real billiard ball moves across a table.
Collision between balls is handled through conservation of momentum and the elastic collision
model. There are several areas that still need work. Of course, I didn't talk about applying
"English" to the shot by striking the ball off the center of mass. This technique is what makes
shots such as a Masse, draw, or topspin possible. This is largely just a matter of where the
impulse from the cue stick is applied. Also, the lack of friction between colliding balls does not
allow effects such as collision-induced spin.
Another problem arises when we consider simultaneous collision between several billiard balls.
When calculating the resulting force when two balls collide, it was fairly easy to determine the
resulting force. However, when several balls collide simultaneously, the law of conservation of
momentum becomes much harder to enforce. In order to calculate the resultant forces
correctly, I need to solve several simultaneous equations. Obviously, this tends to complicate
things quite a bit.
Alas, that will have to wait for another time. Until then, see if you can modify the source code
to handle these effects. You can download the source code and the executable application off
the Game Developer web site at http://www.gdmag.com/.
Beer, Ferdinand, and E. Russell Johnston. Vector Mechanics for Engineers: Statics and
Dynamics, Sixth Ed. New York: WCB/McGraw-Hill, 1997.
Hecker, Chris. "Behind the Screen," Game Developer (October/November 1996-June 1997).
Also available on Chris's web site, http://www.d6.com/.
When not wasting time at the pub eating hot wings and shooting pool, Jeff can be
found at Darwin 3D. There he creates real-time 3D graphics for a variety of
applications. Drop him a line at [email protected].
This article focuses on some of the various back-end services you might wish to provide to your
users as part of your game platform. The purpose of the article is to provide a sense for how
these services can be designed, how they can be deployed, and hopefully, how you can avoid
making incredibly painful mistakes in either their design or deployment.
More and more, having an on-line component to gaming is essential to the success of a title. In
addition, as games become more of a consumer entertainment experience, making things
easier on gamers becomes essential. The days of needing to do such things as manually type in
IP addresses to connect to remote servers are coming rapidly to a close. Therefore, it is
important for you to provide a seamless user experience for your gamers as they go on-line. To
do this, you will possibly need to provision a variety of backend services. Doing so will
dramatically increase the usability of your games, hopefully leading to increased sales and
recognition.
This discussion will give you not only a sense of the many different backend services you should
consider, but will also provide you with information about how some of these systems can be
designed. Although much of the discussion is focused on a few specific kinds of backend
services, the examples are relevant for any backend services you might need to deploy.
The main focus for this articlewill be on network messaging that is not directly related to the in-
game flow of messages, though some design issues applicable to in-game flows will be covered.
At the end, there will be a brief discussion of the PowerPlay industry initiative. PowerPlay is
particularly relevant since it addresses Internet infrastructure problems and deployment
problems that effect the ability of the Internet to handle both in-game and backend server
traffic.
For any game having servers spread around the Internet and hosted by end-users, it will be
important to have a way for your client software to find these servers. While this example fits
the client / server model of first person shooter style games quite well, the general rules are
applicable to the back-end services that other kinds of games could use. With this in mind, an
interesting back-end service that you will probably want to deploy is a game master server.
A game master server is used to collect a database of active game servers available on the
Internet. Your game client (or external "server browser" applications) can then query this
collection when users are searching for on-line games to join. The basic networking
requirements for a game master server are:
1. Ability to receive initiation / keepalive and termination of services messages from game
servers; and
2. · Ability to receive queries from game clients and return appropriate server identification
information to those clients
Although this basic functionality requirements list is brief, there is a further set of additional
services you might also choose to provide in deploying your game master server. For instance,
you might want to support the capability to restrict the search with a set of search criteria.
Initiation / keepalive messages can be kept fairly small. The game server simply sends a packet
to the master server using an agreed upon protocol. For instance, many games have chosen to
signal "out of band" traffic by pre-pending a 32 bit integer negative one (0xffffffff) as a header
to each query packet. The simplest initiation message in this scenario is then to send a single
byte payload along with the packet header to signify that the game server is online. We used
this mechanism in Half-Life. Thus, a packet having a five byte payload is about as small as
these packets are going to get.
Because UDP packets also have a 28-byte packet header each, the simplest initiation /
keepalive message weighs in at about 33 bytes (28 byte header plus 5 byte payload). Unless
you are sending additional data only at startup of a server, it may be convenient to collapse
keepalive and initiation messages into the same exact message. Also, rather than sending the
server's IP address in the packet payload, we can just look at the IP address from which we
received the packet to determine that information. This saves a few bytes and makes it
marginally tougher to spoof the "from" address (more on that below). For Half-Life, the
initiation and keepalive messages were the same. With this base message size in mind, traffic
statistics can be estimated as follows (assuming a fairly popular on-line game having up to
about 2500 active game servers reporting to the master server at any one time):
Using the above data, a game platform supporting this number of servers will create
approximately (2500 * 33)/300 bytes/second in traffic to the game master server. This works
out to about 275 bytes/second of inbound load on the server in the form of approximately
2500/300 or 8.33 transactions per second. Even a relatively low bandwidth connection at the
master server should easily be able to accommodate the traffic coming in from game servers.
In addition, termination messages are as simple as the basic keepalive message described so
far (moreover, termination messages don't necessarily require sending any additional data even
if initiation or keepalive messages do), and their frequency can be assumed to be quite low.
Therefore, we'll assume that termination messages probably will not add much to the
bandwidth requirements for the game master server.
In the real world, game servers are known to crash or become disconnected fairly regularly.
Therefore, the game master server must be able to purge non-responsive game servers from
its list occasionally. Assuming that game servers send keepalive messages every five minutes,
as in the example above, it is probably safe to discard the address of any game server that has
not sent a keepalive message to the game master server over the last few multiples of the
keepalive interval. For instance, if your keepalive messages come once every five minutes, then
you would discard the server from the list if a keepalive message is not received at least once
every fifteen minutes. Of course, receiving a termination packet causes the server to be
discarded immediately. Because not all servers terminate cleanly, there are often a few non-
responsive servers in the list of servers returned to users.
Denial Of Service Attacks on Game Master Server:
Unfortunately, the basic game master server just described (which is directly responsive to
small keepalive messages from servers) is open to several straightforward Denial of Service
(DoS) attacks. For instance, the hacking community is quite adept at spoofing the "from"
address field in IP headers. Thus, the following operations could cause a lot of problems for a
game master server:
This kind of attack hurts the game master server by causing it to store a bunch of bogus IP
addresses for purported game servers. One worry is that the attack would cause the machine to
run out of memory. Assuming that each such record on the master occupies:
z 6 bytes IP address
z 4 bytes pointer to next server; and
z 4 bytes time of last server keepalive (for removing outdated servers)
Fortunately, in this example, at 14 bytes per record, it would be difficult, though not impossible,
to force the game master server to die from running out of memory. This is especially true
since you will almost certainly want to provision a high-end machine to act as the master
server. Also, using this kind of attack, the malicious person might not be able to send enough
keepalive messages before the master server starts removing old servers. Then again, the CPU
load of iterating through a few hundred thousand servers and comparing timestamps every few
seconds could still be a major problem. On the other hand, if your master server stores fairly
substantial data about each server, then it is quite possible that the game master server could
run out of resources.
The nastier problem with this attack, though, is that it makes the server lists returned to your
game clients virtually useless. Not to mention that if the malicious user has added 100,000 or
more bogus servers to your master server list that it would mean that every time a user asked
for the server list that the master server would try to send them approximately 600Kb (100 K *
6 bytes per server) of data. The master server probably will not be able to serve up that
amount of data if there are more than a few users querying it.
This is obviously a major problem. One solution to this kind of attack is to implement a
challenge/response system that servers must go through in order to be listed on the game
master server. To implement a basic challenge/response system, the game master server must
store, for each potential server to be listed:
The master server creates this record (or updates the one for the same from IP address if it
already exists) and then sends a packet to the game server containing the random number. The
game server then must send a keepalive message to the master and must include the random
number in that message. If the challenge request came in from a bogus IP address, then the
requester would never receive the random number back from the game master server. In
addition, if a keepalive message is received and the random number is wrong or the message is
out of date (i.e., the challenge/response record is more than a couple of seconds old), then the
keepalive message can be easily ignored.
Of course, even this system still has some vulnerability to a DoS attack where the attacker has
sufficient bandwidth to occupy all of the "challenge" slots (depending on data structure used
and number of slots allowed to be active and the timeout period on such slots) or simply to
overwhelm the master server's connection itself, but those kinds of attacks are, at least, fairly
traceable. For example, id Software, creators of the popular Quake series of games,
experienced such an attack in January 2000. The attackers were able to saturate two full DS3
(T3) lines of 45Mbps capacity. I.e., the attackers must have had access to a better than T3
capacity connection.
Additional Data
The game master server described so far only encompasses the most basic functionality.
Additional server specific data can be stored at the master server, especially as a way to
streamline the amount of data sent back to users based on queries. This kind of tradeoff of
processing and storage for bandwidth can often be a good idea. For instance, if the master
server were to store current and maximum players, probably encoded as a byte or short each
for most games, then the memory overhead wouldn't go up very much. Nor would the size of
the keepalive packets from game servers containing this data. With this data, queries from
users could request that the game master server filter out empty or full servers. In this fashion,
the size of the server list returned by the master server is reduced, thereby lowering the
server's outgoing bandwidth requirements. In addition, by not always requiring the end-user to
talk to each game server to discover the number of players, the load placed on your game
servers in responding to information queries could also be reduced.
Finally, the design of the master server should be looked at from a network reliability point of
view. For example, is there much consequence to having a keepalive message packet dropped?
In the Half-Life case, we decided that there probably wasn't much of a consequence.
The more important part of the game master server is the client query response portion. Unlike
traffic from keepalive messages from servers, the load from querying by users can by quite
staggering. The main purpose of the game master server is to send each requester a list of the
IP addresses of the active or relevant game servers. The request for the list of servers can be
as simple as the keepalive message, except that instead of a keepalive code, the client sends a
"list servers" code. Estimating the frequency and number of such list requests to the game
master server is a little more difficult.
For instance, if we assume a community having 10,000 users simultaneously on-line at peak
times and that each such user requests a new list of all servers from the master server every
15 minutes, we can generate some useful load statistics. Based on these numbers, the game
master server must respond to about 11.1 such requests per second. If the requests are about
33 bytes each, then that's only 367 bytes or so per second of requests coming into the server.
Where things become interesting is when we look at how expensive it is to the server to send
out the responses.
Assuming again that there are 2500 active game servers, sending back the server list to the
client will require 2500 * 6 bytes per server, or approximately 15Kb of data per query request.
Now, at 11.1 such requests to service per second, the bandwidth required to service this many
queries is about 166.5Kb/sec (1.33Mbps). That data rate would nearly saturate a T1 line. Of
course, there is a major flaw in this scenario as well, at least if we are using UDP. UDP packets
of 15K in size are not supported. In general, anything above 1450 bytes or so is considered bad
form for IP based packets such as UDP packets. Routers or other equipment often discard
packets larger than about 1450 bytes. This raises the question of how to solve this problem.
There are at least two straightforward solutions to this problem. Both end up sending the server
list to the requester in batches. In the first method, the master server stores off the requester's
IP address and over the next few seconds, sends batches of server IP addresses to the client.
Assuming these packets will be at most 1450 bytes long, there is enough room for about 240 or
so IP addresses per batch. For the example above, this would require about eleven such
batches to communicate all 2500 servers to the client. How quickly should these packets be
sent back to the client? That depends on two factors, the client's inbound bandwidth capacity
and the master server's outbound bandwidth capacity. How do we know the bandwidth capacity
of the requester? We don't, but perhaps part of the request is the bandwidth capacity of the
requester. Then the spacing of the packets is easy to figure out. For example, if the requester
states that it can receive up to 2500 bytes per second on it's link, then the master would
determine the time for sending the next request as follows:
Next Packet Time = Current Time + Size of Current Packet / Remote Receiving Rate
Thus, for 1450 byte packets on a link that can handle 2500 bytes / s of data:
Next Packet Time = Current Time + (1450 bytes + 28 byte UDP header) / 2500
= Current Time + .5912 seconds
The update to this client will take place over several seconds. Of course, if the client's link
speed is stated incorrectly (especially overstated), the game master server could easily flood
out the client's connection. In addition, the requester would also require a mechanism for
knowing how long to listen for additional data packets (e.g., embedding "packet # 1 of 6" in the
responses, or signaling how many total servers will be forthcoming).
The second alternative is a bit simpler and avoids some of the potential problems noted above
because it does not require that the master server remember any state info about the requester
and it does not require that the game master server have any knowledge of the requester's link
speed. The tradeoff here is that it can take a bit longer to get the full list from the server
(depending on the ping from the requester to the game master server). The second method is
more or less a batch request-response method. In this method, the master server stores the
servers in sequential order (an implementation detail) with some sort of sequence number
associated with each server and the requester simply requests the next batch of servers
starting with the server after the last sequence numbered server it received. In other words,
the requester code is something like this:
As responses are received, either a new batch is requested, or if the "nextbatch" the master
server tells us to request is "0", then we know we have received the last batch of servers. Care
should be taken to handle the case where the exact requested "batch" number is no longer
active on the server (I.e., it timed out between the server response saying it can be requested
and the actual request packet coming in to the master server).
if ( nextbatch != 0 )
{
// Continue requesting
RequestBatch( nextbatch );
}
else
{
// Done.
}
}
The nice thing about this approach is that it "self-regulates" the bandwidth required. The next
batch of server IP addresses is not requested until the previous batch has been successfully
received over the link. The down side of the approach is that if the requester has a high-latency
connection to the master server, then the round trip time per packet can be a bit higher than if
the master server simply transmitted packets as fast as possible as stated in the first scenario
above.
With either method of receiving the entire list, data querying can be accommodated. The
returned list of servers IP addresses is just culled for servers that don't meet the requested
criteria.
So far, we've talked about the game master server as using a datagram driven communication
model -- UDP. However, for each of your backend services, you will need to decide which
networking protocol makes the most sense for its actual use. The main choice you have if you
are developing on the Windows Operating System is whether to use TCP/IP or UDP/IP. The
general properties of each as follows:
· TCP/IP:
z Reliable
z Stream oriented - so no size limit on data
z Connection persistence/overhead
z Slow start
· UDP/IP
z Unreliable
z Datagram oriented with limited size per packet
z Low overhead
Based on the needs of the game master server described above, UDP is probably the better
choice. In particular, the server is designed with more focus on handling the shear volume of
list requests and cannot spare the overhead of maintaining sufficient listening sockets for
completing query requests that can take upwards of 5 to 10 seconds to complete.
Assuming the game master server has the basic functionality above, dealing with failure cases
is where you will spend the next large portion of time coding and debugging. Having a clear
idea of how you want things to behave when everything goes wrong is essential.
Handling of the failure to receive responses from the game master server can take several
forms. Perhaps missing one of the IP address packets from the first method is no big deal. If so,
then assuming at least one such IP address packet has been received, the request can be
considered successful. Otherwise, you will need to decide whether the protocol should detect
dropping a particular packet ("packet 5 of 6" was never received) and whether the whole series
should be re-requested at that point or whether a special query for just the missing packet
should be undertaken. Under the request-response model of the second example, if a response
from the server is dropped, then the request can be remade shortly thereafter.
For our master server, a failure occurs when either a request packet is dropped or a response
packet is dropped. Generally, a packet can be assumed dropped if a response is not received
within a specified timeout period. The important thing about timeouts is to avoid race
conditions. Race conditions can occur when:
The best way to avoid this is to grow (doubling, for example) the timeout period with each
retransmission of the request. After a few such resends, if no response is received, then the
requester is either experiencing a ton of packet loss between his or her machine and the game
master server or the game master server has gone off-line for some reason.
If the game master server goes off-line and you do not have another one available, then your
system has failed catastrophically. Something to consider is at least having redundant game
master servers positioned on the East and West coast of the United States (and possibly
Europe, Australia, or Japan) as it is quite common for the main East-West Internet backbone
links to fail for short periods of time.
If you deploy multiple game master servers, then to avoid the above failure, your client must
know how to talk to each available master server. If communication with the first one fails, you
can try and talk to the next game master server, and so on.
There is a caveat with having multiple master servers, especially if your goal is to distribute
workload between them. If you hard code all of your clients automatically to query only one of
the master servers, instead of randomly distributing the requests, then it is likely that that
server will be overworked while the other / failover servers will be underutilized. Instead, you
should consider scrambling the list of game master servers that the requester will contact so
that the load on the servers is evenly distributed. The only mitigating issue here is concerns
about having requesters talk to the "closest" game master server (either geographically or by
number of hops) so that the latency and packet loss issues are reduced.
Dealing with failure leads to issues of scalability. Scalability refers to both the capacity of the
system to handle increased demand and to the ability of the system to handle failure of
individual components. In our example, being able to add and remove a game master server
will probably mean that each game master server should also be able to serve out a list of all of
the other master servers that can be checked. In other words, users have to be able to find out
the list of active servers in some fashion.
Another aspect of scalability is how comprehensive your databases are across the various
backend servers. When your backend services include multiple game master servers, you have
to determine how you are going to make sure that basically the same list of game servers
exists on each of the master servers. This particular process is generally referred to as peering
of the game server databases. There are a couple of basic ways to accomplish peering.
In the first method, the burden of making sure all master servers know about a game server is
placed on each game server. In this scenario, the game server sends keepalive messages to all
known game master servers. The downside is that each game server must duplicate the
keepalive message and must somehow track the addresses of all of the master servers.
Probably the better way to do this is to have a peering protocol in the actual game master
server. With this type of protocol, the game master servers can inform all of the other master
servers of keepalive and termination messages. Upon receiving a keepalive request, for
example, the master acts more or less as a conduit and passes the keepalive message (and the
actual IP address of the game server) on to the other master servers. Those servers then
simply add the underlying game server to their active lists (as if they had themselves received
the keepalive message directly) and life goes on. The only caveat is to make sure that the
passthrough packets are labeled in such as a way as to prevent them also being peered to the
other master, thereby causing a never-ending, ever-expanding message loop to occur.
If, rather than having multiple separate addresses for your master servers, you are going to
house multiple game master servers behind a single IP address, then you will probably want to
use a load balancing system.
Authentication Server
Even though the game master server is a fairly straightforward server to implement, there are
a lot of things to consider. With this in mind, we'll turn to a different type of server that we
used during the deployment of Half-Life, the authentication server. This server has a different
usage characteristic and requirements than the game master server and will help to
demonstrate several other considerations for backend services.
The main purposes of the authentication server we deployed in Half-Life were to validate a
user's CD key and to check to see if the user's executable was out-of-date (which would then
invoke an auto-update mechanism using yet another backend server).
In order to never send a plain text (i.e., unencrypted) CD key over the Internet, we designed
the authentication protocols to use public/private key cryptographic techniques for transmission
of the back and forth dialog between the authentication server and the end-user software.
In addition to sending the CD key to the server, the client also sends encrypted version
information to the authentication server so that the user can be told about updated versions of
the software.
One thing we found out is that a lot of users have virus problems on their systems. In
particular, the CIH virus turned out to be the main culprit behind version mismatch errors and
was apparently infectious enough to affect thousands of our users. This was causing our
versioning system to tell the client that it was in need of an upgrade. Of course this was not
actually true at the time. As a result, we implemented routines in the client to self-CRC check
the executable at startup.
Similar to the game master server, the authentication server can be quite resource intensive.
This is especially true considering it must not only check versioning data, but also validate CD
keys and perform all of the necessary cryptographic functions. Therefore, it is important to be
able to bring on-line additional authentication servers as needed and to make sure that the
end-user software can fail over to the other authentication servers when there are problems
reaching a particular server.
Making the Protocol Choice:
Based on the need for a multi-part conversation, it made sense for us to consider using TCP/IP
as the transport mechanism for authentication. Using TCP/IP, as noted, requires a significant
OS overhead in setting up and dedicating a socket to handling a particular conversation. Thus,
you should probably consider setting up the backend server using a thread for each listening
socket. To prevent a malicious user from totally occupying all available server sockets, you
should quickly disconnect the TCP connection as soon as there is trouble in the information, or if
the socket times out.
Server load is the biggest issue for the authentication server. The following is a bit of
information about the load we've seen on one of the multiple authentication servers we use for
Half-Life.
Auth transactions
A typical usage graph shows how the data load (outgoing bandwidth needed) varies and peaks
throughout the week:
In addition to game master servers and authentication / CD key checking services, there are
various other backend services that you might choose to provision. For instance, after we
released Half-Life, we soon realized that we needed to make the process of finding,
downloading, and installing custom games or MODs (game MODifications) easier than it had
been in the past. We chose to solve this problem by creating a new master server to handle
serving out information about existing mods. The master just provides our clients with a list of
MODs, a bit of information about each one, and the ftp site from which the MOD could be
downloaded. The clients would then handle downloading and installing the MOD to the right
spot.
The engineers a WON.net have developed a robust set of backend services that you might
consider using for your games if you don't have the bandwidth, courage, or expertise to develop
and deploy your own systems. Please feel free to e-mail me for further contact information for
WON.
On-line Chat Service: Another interesting backend service that you might choose to deploy
for your game platform is chat. One method of delivering 'chat' to your users is simply to code
IRC client support into your game. While this is certainly functional, you should be aware that
IRC servers are subject to a whole host of interesting attacks and user behaviors that might not
be desirable.
If, instead, you determine that you will be creating a custom chat service, then there are a
couple of ways you can handle design and implementation. The main issue will be whether to
use a client / server or a peer-to-peer model. The other consideration is how many connections
you want to support and whether you want to maintain any control over the creation and
participation in chat rooms.
Using a client / server model can be a bit simpler, but does require that you (or one of your
users) set up a host server. Will the server portion host handle multiple chat rooms or will it
simply service just one chat room? If it services multiple chat rooms, then you need to consider
the load this could create on the server. On the other hand, if each server will only handle one
chat room, then the main issue is making sure that users find out the address of the server so
they can initiate a connection. You could create a chat master server similar to the game
master server to accomplish this.
For chat, the underlying protocols are pretty easy. First, each client initiates a connection to the
server. If you use UDP, you need to build in a way to make sure, in a reliable way, that the
connection succeeds. If you use TCP, this concern is obviated. However, using TCP could limit
the number of simultaneous users you can handle in your chat rooms. The server then notifies
all of the listeners of the new user joining (again handling all reliability issues). Finally, when
users talk, the server simply echoes the text to all other users.
Using a peer-to-peer approach is a bit more complicated since each peer must be able to keep
up-to-date on all of the other peers. You can accomplish this by having one of the peers act like
the "server" and handle join/part and text message retransmits to everyone else. Of course,
this also means that you have to handle that guy dropping out of the chat (do you kill the chat
or appoint a new "server" on the fly?) Otherwise, each client must be able to handle join / leave
messages and to be able to retransmit text to all other users. You still have the issue of how
other people find out the addresses of participants so they can join the chat. In addition,
synchronization of the peers becomes an issue.
Auto-Update: Another type of service you might wish to provision is an auto-update service.
For us, this service was a natural extension and justification for our authentication service. We
believe that fragmentation of our user base caused by "voluntary" upgrades is generally a really
bad idea. Therefore, we implemented the authentication system as a way to ensure that all of
our on-line players are always up-to-date and compatible. When authentication fails because
the version data appears out of date, we invoke a separate auto-update executable. This
executable is nothing more than a fancied up FTP client that knows where to search for updates
and how to download, decompress, and run the installers for them.
PowerPlay: Most action game experiences on the Internet can be characterized as realtime,
latency sensitive applications. The current state of the Internet infrastructure is not tuned well
for this kind of gaming. To make the Internet the future of entertainment, improving the
infrastructure will be critical. We are currently getting started on an industry initiative to create
an open-standard to address various infrastructure issues on the Internet. This initiative is
called PowerPlay. For more up-to-date information about PowerPlay, please check
http://www.powerplayinfo.com/
Conclusion
Provisioning backend services for your game platform is a critical component to the success and
longevity of your game. There are a variety of such services that you might provision, but they
generally fall within just a few classifications. In general, backend services are there to make
your user's lives easier and, therefore, designing them with typical usage patterns in mind is
important. Understanding how your backend services can be attacked or overloaded is also
important. For almost all backend services, you will have to take into consideration a similar set
of design decisions , and you will need to handle scalability and failure cases elegantly in order
to keep your user base happy.
Discuss this article in Gamasutra's discussion forums.
I have decided that friction is a drag. It's almost as easy to understand as gravity. We deal with it every day.
Friction keeps me from sliding completely under my desk when I slouch in my chair. It keeps my car from
spinning out of control as I turn corners with reckless abandon.
This experience with friction begins when as babies we attempt to scoot across the floor and find the carpet
difficult and the linoleum floor relatively easy. We build upon our experience until as elementary-age children
we are able to pick up our video console controller and expertly proclaim, "This game looks so fake - the cars
are sliding all over the place. The physics in this game bites!"
That is the challenge game developers face. The physical world is so familiar to everyone in your potential
audience, any departure from realism can be glaring. However, realistically simulating these simple physical
properties is quite challenging. This month, I'm going to discuss simulation of friction in real-time 3D
applications, otherwise known as the field of tribology.
When you are applying force on the book, the friction force opposes your efforts. Let's take a look at a
diagram of this situation. Figure 1 shows a free body diagram of the book in static equilibrium, meaning that
the book is not moving.
Since the book is in static equilibrium, we can determine a number of things via the principles of statics. The
normal force, N, to the collision of the book with the surface is equal in magnitude to the weight of the book,
W. Also, the friction force, f, must also be equal in magnitude to the force being applied on the book, F.
The Coulomb static friction model states that the magnitude of the friction force is less than or equal to the
normal force, N, multiplied by a constant coefficient of static friction, ms. This coefficient describes the degree
of smoothness between the two surfaces and generally depends on the material composition of the contacting
objects. This value typically varies from 0 (which would be a perfectly smooth, frictionless surface) to 1 (for a
very rough surface). Some examples of coefficients of static friction can be seen in Table 2.
When the force you are applying on the book causes the book to be
on the verge of sliding, the friction force that opposes your efforts is
at its maximum. At this point, slip is said to be impending. Through
statics you can calculate the magnitude of the force necessary to
cause this slip.
Therefore, the maximum force that can be applied on the book before it begins to slip is µsN. What is
interesting, and complicated, about static friction is the fact that the friction force increases to equal the
applied force until this threshold has been reached.
Once the applied force is greater than the slip threshold, the object starts moving. We now leave the world of
statics and enter the world of dynamics. It's actually very similar to static friction. The magnitude of the friction
force between two dry contacting surfaces that are sliding relative to each other is
where µk is the coefficient of kinetic friction. This force resists the motion of the two bodies. Its direction is
opposite the vector of relative velocity between the objects. In general, the value of µk is smaller than ms.
However, this does not always have to be the case.
That covers the Coulomb dry friction model in both static and dynamic situations. By simply implementing
these two methods, you can create a world represented by interesting physical properties.
An obvious application of the Coulomb dry friction model is for travel over surfaces. You may have a game
that requires a character to travel over various types of terrain. By specifying different coefficients of friction
for different types of terrain (asphalt, grass, ice, and so on), you can simulate movement over this terrain in a
realistic, and even more importantly, a physically consistent manner.
Many games simulate friction simply by reducing the velocity by a percentage based on the surface type. This
may seem at first to be the same thing as the dry friction model described above. However, it differs from it in
many critical ways. By adjusting the velocity directly, you eliminate the side effects of applying the friction as a
force. These side effects are what make objects in the physical simulation behave the way players expect
them to behave. These small breakdowns in the simulation make it glaringly apparent that the world is fake.
Perhaps an example would help here.
If you have ever pushed a box really hard, particularly if your point of
contact is near the top of the box, the box will sometimes tip over
before it starts sliding. In fact, if you throw a box across the room,
once it hits the floor it will tumble all over the place instead of simply Figure 3. You can control how much
sliding to a halt. People are used to these facts. They live with them force Sara must exert on the box
every day. If your world does not address these behaviors, it will not before it moves
feel right.
Figure 2. Forces exerted on a box
as it verges on tipping over.
But why does the box tip over? Well, guess what, it is all about friction. Take a look at the box in Figure 2.
Sara will be applying a force, F, to the box h units above the ground. What I'm looking for is a state for the
system where the box is about to tip over at point A. I can apply the principles of statics to solve this problem.
(If you are not familiar with statics, check out the For Futher Info section at the end of this column.) For an
object to be in static equilibrium, the sum of all forces and the sum of all moments in the body must equal
zero.
When the box is about to tip over, there is only a reaction to the ground at point A. The support on the other
side has no reaction to the ground. Therefore, we can state the equilibrium equations. Let me start with the
sum of forces.
The sum of horizontal forces consists only of F and f, and they directly oppose each other. In the vertical
direction, the weight W and normal force N are also equal and opposite. The sum of moments however, is a
bit more complicated. You may remember from physics that the moment of a force about a point P is
Figure 4. Sara tips the box over At the point of equilibrium where the box is about to slip,
instead of sliding it away.
If Sara applies the force at a point (0.5d)/ µs units high or higher on the box, the box is going to tip over before
it starts sliding. What's even more interesting is the fact that the equation above states that the value for h is
not dependent on anything other than the dimensions of the box and the coefficient of static friction. The
magnitude of the force F does not matter at all. It may seem that if Sara pushes harder, the box would be
more likely to tip. Statics proves that this is not the case.
I am convinced. I want to have boxes that tip over if you push them too high. That seems like something cool
to have in my game. But how do I go about accomplishing this task?
I have been building up the pieces I need. If you look back to my previous articles, "Collision Response:
Bouncy, Trouncy, Fun," and "Lone Game Developer Battles Physics Simulator", I have a soft body dynamics
package that models the forces and handles collision with surfaces. I will first handle the kinetic friction
problem.
and the direction of the force is determined by looking at the current particle velocity. In my simulation, if the
velocity of a point is greater than a certain threshold, e, I determine that I need to use static friction for all
contacting points. Listing 1 shows the code for calculating and adding in the force of friction.
The only change I really had to make to the structure of the program was to a storage space for the contact
normal for contacting particles.
Static Friction
Handling static friction, however, is much more complicated. The problem is that static friction requires that I
determine when each contacting particle makes the transition to sliding. From the calculations above, I know
that the point of transition is when
F = µsN. Until that transition occurs, the static friction force needs to prevent sliding completely. That is, I
need to make sure that the particle acceleration is kept at zero. Once the particle begins sliding, then the
force opposes the acceleration and has a maximum of µsN. All of these conditions lead to a situation that is
too complex to be calculated in my simulation. David Baraff (see For Further Info) suggests a couple of
approximations.
The more complicated method Baraff suggests is to approach static friction as a quadratic programming
problem. However, this method is prone to failure in certain circumstances. The other suggestion, fortunately,
is easy to implement.
First, establish a velocity threshold value e which determines when to use static friction. This threshold is then
used to scale the friction force as the velocity varies from 0 to this threshold. The formula for calculating the
static friction force then becomes
F = (µsN)(v/e). This force is applied in the direction opposite the velocity of the particle. Listing 2 contains the
code for handling the static friction forces.
One unfortunate side effect of this approximation of static friction is that it can play hell with your integrator.
When the particle is moving and subject to kinetic friction, things work well. However, when static friction kicks
in, the direction of the static friction force swings wildly with small fluctuations in velocity. This plays havoc
with the integration. If the value for e is too small, the differential equations can become "stiff," requiring more
complex integration techniques (See "Lone Game Developer Battles Physics Simulator").
Let's Drag
Now I can get objects to tumble around realistically as well as slow to a halt based on the current coefficients
of friction. You can download the source code and executable to the sample application from the Game
Developer web site (http://www.gdmag.com/).
When not fighting the friction that keeps his butt planted in Redondo Beach, Jeff creates custom 3D
real-time graphics applications at Darwin 3D. What's the roughest surface you know? E-mail it to him
at [email protected].
Baraff, David. "Coping with Friction for Non-Penetrating Rigid Body Simulation," Siggraph Proceedings: July
1991, pp. 31-40.
Beer and Johnston. Vector Mechanics for Engineers: Statics, Sixth Ed. New York: WCB/McGraw-Hill, 1997.
Hecker, Chris. "Behind the Screen" columns. Game Developer, October 1996-June 1997. Also available on
Chris's web site at http://www.d6.com/.
Lötstedt, P. "Numerical Simulation of Time-Dependent Contact Friction Problems in Rigid Body Mechanics."
SIAM Journal of Scientific Statistical Computing Vol. 5, No. 2 (June 1984):
pp. 370-393.
// Calculate Magnitude of Fn
FdotN = DotProduct(&curParticle->contactN,&curParticle->f);
// Calculating Magnitude of Fn
FdotN = DotProduct(&curParticle->contactN,&curParticle->f);
// Calculating Vt Velocity Tangent to Contact Normal
VdotN = DotProduct(&curParticle->contactN,&curParticle->v);
ScaleVector(&curParticle->contactN, VdotN, &Vn);
VectorDifference(&curParticle->v, &Vn, &Vt);
Vmag = VectorSquaredLength(&Vt);
NormalizeVector(&Vt); // Get the Direction of Vt
if (Vmag > STATIC_THRESHOLD) // Handle Static Friction
{
ScaleVector(&Vt, (FdotN * m_Ckf), &Vt);
// Multiply By Normal force magnitude and Coef of Kinetic Friction
VectorSum(&curParticle->f,&Vt,&curParticle->f);
}
else // Handle it as Kinetic Friction
{
Vmag = Vmag / STATIC_THRESHOLD;
// Multiply By Normal force magnitude and Coef of Static // Friction
// And Static approximation ratio
ScaleVector(&Vt, (FdotN * m_Csf * Vmag), &Vt);
VectorSum(&curParticle->f,&Vt,&curParticle->f);
}
What's a Procedural Object and Why Would You Want to Create One?
A procedural object is created by an algorithm that takes a set of parameters that define the
object and produces the object representation from that set. For example, we can write a
procedure to create a box, as follows. Given the center, the width, and the height, the
procedure generates a list of faces and vertices that constitute the mesh representation of the
box. We could also create the box mesh manually by specifying the vertices and faces, but
that's very inconvenient. In many cases it is necessary to create the object procedurally; for
example, it would be very tedious to create a sphere by explicitly specifying vertices and faces.
It is possible to create procedural objects inside 3D Studio MAX to get data into your game
engine since a procedural object in MAX can store information other than whats required for the
visual representation. For example, suppose you want to specify certain locations in the level
which you will use in your code to place characters and objects. You could create a Location
object that stores the position and an id number. The designer can create various instances of
Location objects within the level and assign a unique id to each; this data can then be exported
into a file and used by the game engine. Or you could create a grid for pathfinding information
and assign attribute values to each square - occupied, free etc.
In this article we will see how to write a plug-in to create a simple procedural object in Max 3.1.
In order to cover the basics and provide a starting point for programmers to create their own
procedural objects, we'll focus squarely on the MAX SDK and won't get involved in the
intricacies of creating complicated objects like teapots and ringwaves. We'll stick to the bare
minimum required to write a functional procedural object plug-in for the most part and after
that approaches to advanced techniques will hopefully be clear. We'll also try to avoid getting
into discussions of theoretical aspects; I'd rather just tell you where to look in the MAX SDK
help file to find a detailed treatment of the topic.The code for a demo demonstrating the
concepts found in this article can be found here.
Since it's not much use creating this kind of procedural object if you can't use it in your game
engine, we'll take a quick look at how to export them.
So what object should we create? The object should be simple so people who want to create
other kinds of objects won't get bugged by details they see as irrelevant, but it should be
complex enough to examine all the basic concepts related to procedural objects. A portal object
should fit the bill. Our portal object will be defined by four vertices on a plane, a source cell,
and a estination cell, and could be used to implement a simple portal-culling scheme. In our
example, the cells will be the standard MAX boxes. If you don't know about portal culling you
can read up on it at the Flipcode site (http://www.flipcode.com/). In any case it's not all that
important; we'll just see how we'd go about creating an object with the above-mentioned
properties without worrying about how to use it in a real-world application.
Preliminaries
I'll assume that you're familiar with the basics of MAX itself, if not the SDK. Any previous
experience of creating plug-ins of any type would be very helpful. We'll keep things simple so
you should be able to follow most of the stuff even if you've had no exposure to the MAX SDK
before this.
z Visual C++. I'm using 6.0 but 5.0 should work fine
z 3D Studio MAX 3.1 and the SDK
One of the best sources of information about the MAX SDK is the help file and we'll often look to
it for details, so it's a good idea to keep it open in the background while reading this article.
Now is a good time to open it and bookmark the first page, the one titled 'Plug-In Software
Development Kit'; all references will be relative to this page. Also keep in mind that to look up a
method, we need to type 'methods, MethodName' in the help file search index and to look up a
class we need to type 'class ClassName'.
Plug-in Basics
All MAX plug-ins are Windows DLLs and are usually located in the 'Stdplugs' and the 'Plugins'
directories. Typically, 'Stdplugs' contains the plug-ins that come along with MAX and 'Plugins'
has third-party plug-ins that you download from the Net or write yourself. There are various
categories or types of plug-ins that can be created using the MAX SDK; procedural objects is
one of them. We could also create, among others, plug-ins to import/export files, system plug-
ins like Biped, object modifiers like Bend and Taper, and atmospheric plug-ins for effects like
fog. For a complete list, look in the 'Plug-In Types Overview' section of the help file. There is a
standard filename extension associated with each type which is used by MAX when it is
searching for plug-ins to load. Procedural object plug-in DLLs have the filename extension DLO.
File export plug-ins have the extension DLE, utility plug-ins have the extension DLU and so on.
For a complete list look under 'Writing Plug-In Applications / Plug-In Directory Search
Mechanism'
The MAX SDK is a set of classes that developers can combine and extend to create plug-in
applications. There is an elaborate class hierarchy and the idea is that we tap into it at the right
places to get our job done. You can have a look at this hierarchy under 'Plug-In Architecture
Overview'. To write a MAX plug-in we'd start by identifying what type of plug-in it should be.
Based on this, we'd derive a class for our plug-in from one or more of the classes in the SDK.
This would give us certain inherited virtual functions that we would need to implement in our
class. These functions would be called by MAX at the appropriate time. For example, if we
wanted to do file export, we'd derive from the class SceneExport and implement the function
DoExport() which would be called by MAX when the user asked to export the scene. Once we
have control, we have access to all kinds of scene information and can call functions
implemented inside MAX to do our stuff.
We are real programmers and we want to get right down to the metal, so we won't use any
MFC or appwizards to create our plug-in. It's not very complicated. We create a new project of
type Win32 Dynamic Link Library / Empty DLL Project, add our source code and the required
MAX lib files to the project and build it. We must tell the VC++ IDE to look for header files in
the MAX SDK 'Include' directory as well. And we must make sure that the DLL has the DLO
extension and ends up in the MAX 'Plugins' directory.
One additional thing to keep in mind is that MAX requires us to add a DEF file to our project. If
we don't do this, MAX won't load our plug-in. The DEF file looks like this.
LIBRARY PortalObj
EXPORTS
LibDescription @1
LibNumberClasses @2
LibClassDesc @3
LibVersion @4
SECTIONS
.data READ WRITE
You can use this DEF file for all your plug-ins; just keep changing the library name.
Every MAX plug-in must implement a set of functions that are called by MAX when the plug-in is
first loaded. These functions provide MAX with information about the plug-in and are called
LibNumberClasses(), LibClassDesc(), LibDescription() and LibVersion(). For an explanation of
these functions look under 'Writing Plug-In Applications / DLL Functions and Class Descriptors'.
We get a default DllMain() when we build a DLL using VC++ but we implement our own anyway
because we need to do some initialization at startup.
Let's see what we've accomplished with this code. Actually this is enough for us to make our
presence felt in MAX. At this point we can create a project workspace and add Version1.cpp to
it. We need to add the following lib files from the SDK 'Lib' directory: maxutil.lib, geom.lib,
core.lib and mesh.lib. We need to add comctl32.lib if it's not already there, and also the DEF
file. Now we can build the first version of our plug-in. Next, we startup MAX, go to the Create
branch of the command panel, select Geometry (the default), click on the drop-down box that
says Standard Primitives and we see the new entry Portal. However, don't try to create a portal
right now or else MAX will crash because our PortalObj class is still a dummy and doesn't
implement all the functions required.
Now that we know what it takes to make MAX acknowledge the existence of our plug-in, we'll
create our user interface. The user will create the portal by clicking and dragging with the
mouse in one of the viewports in the same way you'd create a box or a sphere so we don't
really need any specific user interface elements for the creation phase, but after that he'll want
to select the source and destination cells. We'll create a rollup page to facilitate this and add it
to the command panel. The page is pretty simple and looks like this:
The way it works is as follows. To select a source or destination cell, the user can click on the
appropriate button. This action will place him in Pick mode after which he can click on a node in
one of the viewports to select it as the source or destination cell. We can throw in a check to
make sure that the node he selects is a box. If all is OK, we'll display the name of the selected
node in the appropriate edit box.
A rollup page is simply a dialog box created using the resource editor. We must make sure it's
exactly 108 pixels wide. I guess that's because it needs to fit nicely into the command panel.
When you create the dialog box, in the Dialog Properties, go to the Styles pane and set the
style to Child and the border to None. In the More Styles pane tick the Visible check box. We
can use the defaults for everything else. Then we add the static text, the two edit boxes, the
two buttons and we're done.
The rollup page is now ready to be added to the command panel. How do we do this? It so
happens that whenever a procedural object is being created or modified MAX calls a couple of
functions inside the plug-in so we'll know what's going on. These functions are BeginEditParams
() and EndEditParams(), declared as virtual functions in class Animatable which is right at the
top of the class heirarchy shown in 'Plug-In Architecture Overview'. We'll be deriving our plug-in
class from a class that is in turn derived from Animatable so we'll inherit these functions as well
and override them. When our BeginEditParams() gets called, we'll add the rollup page to the
command panel using the AddRollupPage() function and when EndEditParams() gets called we'll
delete it by calling DeleteRollupPage().
We need to implement a DialogProc function for our rollup page the same way we'd do it for an
ordinary dialog box in a Windows application.
The code that includes all this is shown in Version2.cpp. The rollup page is in Resource.rc. For
the time being, we've derived our class straight from Animatable and our DialogProc just
returns TRUE in response to the WM_INITDIALOG message. We add the resource file with our
rollup page to the project and use the code in Version2.cpp to get the next version of our plug-
in.
If we now click on the Portal button in the Create branch of the command panel we can see the
rollup page. Almost anything we do after this, including moving the mouse pointer in a viewport
will cause a crash for the same reason as before and it's about time we did something about it.
We need to put in the code required to create the portal in a viewport after which we can come
back to the DialogProc, add functionality to the user interface controls and start picking source
and destination cells.
The portal needs to maintain pointers to the source and destination cells. We could store this
information in class variables, but there are some issues to be considered. For example, say
we've assigned a node as the destination cell to a portal and we store a pointer to the node in a
class variable. We can't write the pointer to the disk when the user saves the file, so we'd have
to save something like the node name instead. Then when the file is next opened, we'd have to
search for the node by name and initialize our pointer again. That's not so bad but consider
another problem. What happens if the node is deleted? We end up with an invalid pointer.
We have a simple, clean solution to these issues in MAX's concept of References. Let's get a
handle on this next.
References
References in MAX, not to be confused with C++ references, are used to keep track of
dependencies between scene elements.
"A reference is a record of dependency between a reference maker and a reference target. The
reference maker is said to be dependent upon the reference target. If the target changes in
some way that affects the maker, the maker must be notified so it may take appropriate
action."
"The system takes care of loading and saving references when an object is saved to disk. An
object does not need to explicitly save its references, nor does an object need to load its
references. After a scene is loaded, an object's references will automatically be restored."
This is just what we need. In our case the portal is a reference maker and the destination cell
node is a reference target. Let's see how to use references in our plug-in.
If we want to create references to anything in our plug-in class we have to derive it from the
MAX SDK class ReferenceMaker. We are currently deriving from Animatable but since
ReferenceMaker is derived from Animatable as well we'll just subclass PortalObj off
ReferenceMaker instead. We're moving on up in the class hierarchy.
For a scene element to be a reference target, it must be derived from the class
ReferenceTarget. That's OK too, because nodes are an instance of the INode class which is
subclassed off ReferenceTarget. You can look at the class hierarchy diagram now to get your
bearings if you haven't already done it.
We still need to store pointers to the source and destination cells but we'll be using references
to manage them. We begin by declaring class variables.
Each pointer will be associated with a reference and we'll use ID numbers to distinguish
between them. How we will use these IDs will become clear very soon. So we next define IDs.
#define ID_SRC_REF 0
#define ID_DEST_REF 1
The portal then creates a reference to the node using the function MakeRefByID(). If it's a
source node, we pass ID_SRC_REF as a parameter and if it's a destination node we pass
ID_DEST_REF. The code looks like this
Notice that FOREVER. It's supposed to indicate the interval of time over which the reference is
valid in case you're doing an animated sequence but in MAX 3.1 plug-ins can't pass any other
value anyway so we won't worry about it.
When you call MakeRefByID() MAX responds by calling another function that we've inherited
from ReferenceMaker, called SetReference(), with a pointer to the reference target so we can
store it. Our implementation of SetReference() looks like this.
So far we've just taken a rather roundabout route to assign a pointer to a variable. But it's
going to be worthwhile because we've inherited another function from ReferenceMaker called
NotifyRefChanged(). This will be called by the reference targets whenever they are changed in
any way that can affect us and we'll get a pointer to the reference target so we can find out
which one was changed and decide what to do about it.. We'll use it to detect if the source or
destination node has been deleted and if it has, we'll set the corresponding pointer to NULL.
Next we need to think about file loading and saving. Actually, it's already done. The system
saves all references automatically and when the file is opened again, it'll call SetReference()
with pointers to the nodes so our variables get initialized.
To complete the picture, we need to implement two more functions of ReferenceMaker for the
system to use. These are NumRefs() and GetReference().
int PortalObj::NumRefs()
{
return 2;
}
RefTargetHandle PortalObj::GetReference(int i)
{
switch (i)
{
case ID_SRC_REF: return src;
case ID_DEST_REF: return dest;
}
}
In NumRefs(), should we return the number of references we actually have created already or
the maximum number we're planning to create? The answer is: the latter, otherwise problems
will occur. This is because the system handles our references as a virtual array that is 0-based
like in C++. So valid indexes for the system are 0 to NumRefs() - 1. For example, suppose the
user creates a portal, assigns a destination cell only and then saves and closes the file. We've
created only one reference, but if we return 1 in NumRefs(), MAX will only care about the 0th
reference. It won't call SetReference()with ID_DEST_REF when it opens the file again and our
dest variable won't get initialized. If we return 2, on the other hand, our SetReference() will be
called only once, with ID_DEST_REF as a parameter and that's how it should be. In any case,
SetReference()won't be called with ID_SRC_REF, because the 0th reference doesn't exist.
For the complete low-down on references, look under 'Must Read Sections for All Developers /
References'.
Now that we've taken care of the source and destination cell pointers, we can shift our attention
to the geometry data. The portal mesh consists of four planar vertices and two faces. To
generate this, we'll need to get some information from the user during the creation phase in the
viewport. For example, we could store the opposite corners of the rectangle formed by the four
vertices. This data is not subject to the problems faced by the pointers so we could simply use
class variables without creating any references. For now we'll assume that this is what we're
going to do, but there are wheels within wheels, so we'll end up storing the geometry data in a
slightly different way.
We want to let the user create a portal by clicking and dragging with the mouse in one of the
viewports in much the same way as he'd create a box or a sphere. We need to provide MAX
with a function to call so that it can let us know what the user is doing with the mouse. MAX
gives us information about mouse events like MOUSE_POINT, MOUSE_MOVE,
MOUSE_DBLCLICK etc and the position of the mouse pointer. How we use this information is up
to us.
To write our mouse handler, we create a class derived from the MAX SDK class
CreateMouseCallBack, thereby inheriting a pure virtual function called proc() that we will
implement and MAX will call. Let's call this class PortalObjMouseHandler; it's a new class and
quite distinct from PortalObj, our main plug-in class. Now how do we let MAX know about
PortalObjMouseHandler? It so happens that in the class BaseObject, derived from
ReferenceTarget, there is a function called GetCreateMouseCallBack() that MAX calls to find out
about the plug-in's mouse handler. So we move up further in the class hierarchy and derive our
main plug-in class from BaseObject instead of ReferenceMaker, thereby gaining the ability to
handle mouse input along with our existing capabilities of displaying our user interface and
creating references. All is well, and we can now return a pointer to an instance of the class
PortalObjMouseHandler in our implementation of GetCreateMouseCallBack().
While creating our portal, we'll be working in a coordinate system centered at (0, 0, 0) in the
viewport with the +ve X axis to the right, the +ve Y axis upwards and the +ve Z axis coming
out of the screen towards us. I don't know what this coordinate system is called in MAX so in
this article we'll call it as Csx. Csx is independent of the view chosen for the viewport and is not
necessarily aligned with the coordinate system displayed near the bottom left corner. This is
why, for example, when creating a teapot we're always looking at it from above in the viewport
we are creating it in. The exceptions to this are the User and the Perspective views, in which
case the points are in the coordinate system shown in the bottom left corner.
We'll create our portal in the XY plane in a local coordinate system originating at the center of
the rectangle. At the same time, we'll fill in a translation matrix that will contain the position of
the center point with respect to the Csx origin. MAX will use this matrix to display the portal at
the correct position in the viewport. We'll click and hold down the left mouse button to define
any one corner of the rectangle, drag, and let go to define the opposite corner. The rectangle
will always be created facing us; we can then move and rotate it to position it as desired within
the scene. The rectangle parameters we'll store are half the height and half the width and MAX
will store the transformation matrix required to display the portal correctly in the scene.
The code for our mouse handler is shown in Listing 1. The fifth parameter, IPoint2 m, is the
position of the mouse pointer in Csx and the sixth, Matrix3& mat, is the matrix we set to the
initial translation required to display the portal at the correct position in the viewport. There's
also some stuff related to viewport snapping. For details, look for the description of the class
CreateMouseCallBack in the help file. Remember that to find a method in the help file, you've
got to type 'methods, MethodName' and to find a class you've got to type 'class ClassName'.
A Matter of Class
So far, we've been working our way up the class hierarchy based on what functionality we
wanted to add to our plug-in. We started with Animatable and have now reached BaseObject
after a brief halt at ReferenceMaker. I guess it's time I came clean and told you that we're got
some way to go still. We'll be deriving our main plug-in class from the MAX SDK class
SimpleObject. Why? Mainly because the help file says so and that's the way it's done. We
started at the base of the class hierarchy; actually you'd do just the opposite. You'd start by
identifying what class you need to derive from based on your plug-in's type and then see what
methods of the base classes you need to implement as well. The two places in the help file that
serve as starting points for your journey are 'Plug-In Types Overview' and 'How To Create
Various Plug-In Types'. At least for me, this is followed by a lot of relentless clicking as I wade
through the documentation until light starts to dawn. The samples are another valuable source
of information. For this plug-in, I started with Sphere_c in the Samples/HowTo directory. That's
a comprehensive sample with around 1500 lines of code so I kept cutting out stuff until I got a
working sample that was small enough for me to comprehend.
We need SimpleObject because, as the help file says, "Procedural object that represent
themselves with a mesh may use this class. It provides implementations of many of the
methods required to create a procedural object thus simplifying the developer job
considerably."
Before we get on to creating the mesh and picking the source and destination nodes, we'll get
back to the issue of storing our geometry data. Recall that at the end of the section on
references, we said that we'd store this data in a different way because of certain wheels within
wheels. Well, these wheels belong to SimpleObject. Enter Parameter Blocks.
Parameter Blocks
Parameter blocks are not really required for the simple plug-in we are writing. However they
will surely crop up, especially in conjunction with Parameter Maps, when we move on to more
complex applications so it won't hurt to explore this topic now, especially as they're quite simple
to understand and use.
The parameter block is a mechanism for storing the values of a plug-in's parameters. We first
create a description of the parameter block we want to create. This description consists of
information about the number of data items we want to store and their types. We get to choose
from a set of built-in types such as int, float and 3D vector. When we create the block using this
description, MAX allocates memory for the data elements and we access individual items using
indexes.
We'll use a parameter block to store our geometry data. It consists of two floating-point values,
the half-height and the half-width. To describe the parameter block, we need an array of
ParamBlockDesc objects, one for each data item. For each item, we first specify the type. The
next value is always NULL and the third indicates if the item is animatable. Our parameter block
looks like this.
We create the parameter block using the CreateParameterBlock() function. We'll use the first
element in the parameter block for the half-height and the second for the half-width. We can
access our data by ID using the SetValue() and GetValue() functions. For a description of these
as well as more information on Parameter Blocks check out 'Must Read Sections for All
Developers / Parameter Blocks'.
The main reason we are using a parameter block is that SimpleObject expects us to. It has a
variable, IparamBlock *pblock, that we've inherited. Now, BaseObject has a virtual
GetParamBlock() function that the system will call to ask the plug-in for it's parameter block
and the default implementation is to return NULL; SimpleObject overrides this and returns
pblock. The problem with this setup is that pblock is a wild pointer that was not set to NULL by
SimpleObject in it's constructor. The system thinks that it's got a valid pointer to our parameter
block and presumably proceeds to use it which that brings us to a grinding halt. You can see
the code for the class SimpleObject in simpobj.h in the SDK Include directory and simpobj.cpp
in the Samples/ HowTo/Misc directory.
Of course, we can still keep our geometry data in class variables and not create a parameter
block. All we have to do to prevent a crash is to set pblock to NULL in the PortalObj constructor.
But since we've come so far we might as well continue. The parameter block class,
IparamBlock, is derived from ReferenceTarget, so we can create a reference to it. This is again
for the benefit of SimpleObject, which is waiting with it's implementation of the NumRefs(),
GetReference(), SetReference() and NotifyRefChanged(). The SimpleObject version of the first
three is given below.
Note that we now need to call the SimpleObject versions in our implementation of NumRefs(),
GetReference(), SetReference() and NotifyRefChanged(). We also need a reference ID for the
reference to the parameter block.
We're now ready to build our mesh. This consists of filling in the BuildMesh() function that
we've inherited from SimpleObject. We've also inherited the variable mesh that's an instance of
the Mesh class. If you look at a description of this class in the help file you'll see quite a bit of
stuff there, but we'll just initialize the vertex list and the face list.
Let's have a quick look at how the mesh is organized. Basically, it's a collection of faces with
counter-clockwise winding. Each face has three vertices, each of which can have upto three
elements of information, the 3D space coordinates, the texture coordinates and the vertex
color. This information is stored in three separate arrays which are Point3 *verts, UVVert
*tVerts and VertColor *vertCol respectively. To index into these, there are three arrays of Face
objects, the class members Face *faces, TVFace *tvFace and TVFace *vcFace respectively.
So, to get all the information about the ith face in a mesh, we proceed as follows. The vertex
coordinates are verts[faces[i].v[0]].x, verts[faces[i].v[0]].y, and verts[faces[i].v[0]].z. For a
2D UV texture mapping, the texture coordinates are tverts[tvFace[i].v[0]].x and tverts[tvFace
[i].y. Finally the vertex colors are vertCol[vcFace[i].v[0]].x, vertCol[vcFace[i].v[0]].y and
vertCol[vcFace[i].v[0]].z.
The code that incorporates all that we've discussed so far is given in Version3.cpp. The new
features are that we're deriving our class from SimpleObject, we're building the mesh, we've
got references, the parameter block, the mouse handler and last but not least, it doesn't crash
any more.
Node Picking
Next we'll see how to pick nodes for the source and destination cells. The first step is to set the
command mode to a standard pick mode. For example, if we create a box, select it, go to the
Modify branch in the command panel, click on Edit Mesh and set the Selection Level to Vertex,
we're in pick mode. To get to this mode, we need to call the MAX function SetPickMode(). We
got a pointer to the class IObjParam in our BeginEditParams() method and we saved in the
class variable ip. We can use this to call functions that are exported from the MAX executable;
AddRollupPage() is one of them, SetPickMode() is another. To get an idea of what else can be
done with this pointer, look up the class Interface in the help file. IobjParam is identical to this
class.
Once we're in pick mode, MAX needs a function to call to let us know which node the user is
clicking on. We had a similar situation when we were writing the mouse handler. To create a
callback for node picking, we derive a class from the MAX SDK class PickModeCallback and pass
a pointer to an instance of the derived class when we call SetPickMode(). We'll call this class
PortalObjPickModeCallback. In this class, we have two inherited functions to implement: HitTest
()and Pick().
HitTest() gets called when the user clicks with the mouse in one of the viewports. The system
passes it a bunch of parameters like the viewport and the mouse position based on which we
have to find out if a node was hit. Luckily for us, we can call the Pick() method of class
Interface that will hit test the screen position for nodes and return an INode pointer if one was
hit or NULL otherwise.
If HitTest() returns TRUE indicating that a node was selected, our Pick()method gets called.Here
we find which node was selected, create a reference to that node, display it's name in the edit
control and end the pick mode by returning TRUE.
The code that handles the node picking is given in Listing 2. Version4.cpp contains this code
plus Version3.cpp. If you build it, you can see what is close to the finished version of the plug-
in. A couple of problems still need sorting out, however. The first is that when you pick a node
as the source or destination cell in the create pane of the command panel, the rollup page
disappears and we find ourselves in the Move mode. For now, select the portal and go to the
modifiers pane to pick the source and destination cells. Another problem is that any node can
be chosen as the source or destination; a light, a camera or even the portal itself. Recall that
we'd decided to put in a couple of checks to make sure that only boxes were assigned as source
or destination cells. We'll get to that in the next section where we'll descend down the geometry
pipeline into the murky depths of MAX.
This is a complicated topic that I don't claim to know too well myself so we'll just have a brief
overview here. The geometry pipeline system is what allows a node in the scene to be altered,
perhaps repeatedly, through the application of modifiers. To see this in action, create a portal,
choose the Edit Mesh modifier, set the Selection Level to Vertex, click on a corner of the portal
and drag it to deform the portal. How is this deformation happening? After all, we didn't take
care of this kind of thing in our mesh building code. What actually happened was this: when we
created our portal, MAX created a node for it and it became the Base Object at the beginning of
the pipeline. That's where it's still exists unaltered. To verify this, click on the Edit Stack button
in the Modifier Stack rollup page and you'll see that the Edit Mesh modifier has been added to
our node's pipeline. If you now delete the modifier by selecting it and saying Cut, you can see
the portal in it's pristine form. Alternatively, you can say Collapse all, in which case the base
object and the results of applying the modifier will get fused into one giving us an Editable
Mesh. In that case we can say goodbye to the portal as we knew it cause we'll never see it
again.
What we see in the viewport and what gets rendered is the result of evaluating the entire
pipeline with the output of each stage being the input to the next. The final result is called the
world space state of the object. Let's see how we'll use this information. In our plug-in, we need
to check if a given node is a box simply by seeing if it has eight vertices and twelve faces. This
test is rather primitive but will do for now. The first step is to evaluate the node's pipeline and
get it's world space state. This is done by calling the node's EvalWorldState() function to get an
instance of the class ObjectState which contains a pointer to the object that appears in the
scene. To make sure the node's not a light or a camera, we check to see if it can be converted
to a triangle mesh by calling the object's CanConvertToType() method. If it can be converted,
we ask it do so, after which we can access the mesh and count the number of vertices and
faces it has. The code that does this is given in Listing 3.
There is a detailed discussion of the geometry pipeline under 'Must Read Sections for All
Developers / Geometry Pipeline System'. You can read up on nodes and object modification in
the Advanced Topics section. I found it heavy going on the whole but the good part is that we
don't really need to know too much to be able to get our job done. In fact the little we've
discussed will take us far. For example, we'll see in the next section this is about all the
information we need on this topic to export geometry from a scene.
Finally, let's see what to do about the other problem that we had, the disappearing rollup page.
It says in the documentation for the PickModeCallback::Pick() method that returning TRUE will
set the command mode to MOVE so I don't see that we can do much about it. Instead, we'll
cunningly remove the rollup page from the Create pane altogether and ensure that it only
appears on the Modifier pane. If you look at the documentation for the function
Animatable::BeginEditParams(), you'll see that the flags parameter indicates which branch of
the command panel the user is in and none of the given values is for the Modify branch. So
we'll create a rollup page only if flags is zero.
The latest and greatest version of the plug-in code with the nodes check and the sleazy hack for
the rollup page is given in Version5.cpp. That's our procedural object plug-in.
What's Next
How do we export our procedural object? We'd have to write an export plug-in for that and it
would take too long to discuss the procedure in detail here, but we a quick overview is possible.
To write an export plug-in, we need to derive our main plug-in class from the MAX SDK class
SceneExport and implement a bunch of inherited pure virtual functions. One of these is
DoExport(), which is called when the user asks to export the scene into our file format.
DoExport() has a parameter that is a pointer to an instance of the class ExpInterface, derived
from IScene, which can be used to enumerate all the nodes in the scene. The function to do this
is IScene::EnumTree(). We create an enumeration callback which is called for every node in the
scene. Inside this callback, we need to see if the node we got is a portal and, if so, proceed to
export it. One simple way of doing this is to see if it's name begins with the string "Portal". If it
does, we can convert it to a TriObject and get the geometry data. Next we need to get the
source and destination cell pointers. For that we get the pointer to the object in the scene that
the node references and typecast it to a pointer to an instance of the PortalObj class, after
which we can access the class variables src and dest. Part of the code for the callback is shown
below.
if (obj->CanConvertToType(Class_ID
(TRIOBJ_CLASS_ID,0)))
{
tri = (TriObject *) obj->ConvertToType(0,
Class_ID(TRIOBJ_CLASS_ID, 0));
return TREE_CONTINUE;
}
When exporting the geometry, remember that the vertex coordinates will be in the local
coordinate system. The matrix to convert them to world coordinates can be obtained by calling
the node's GetObjectTM() function. There's also a INode::GetNodeTM(), but that returns the
transformation matrix for the node's pivot, because in MAX the node's pivot and the geometry
can be moved around independently of each other.
You might want to add another rollup page with spinner and edit controls so that the user can
interactively adjust the creation parameters. This can be done using MAX's custom controls and
Parameter Maps. The documentation can be found under 'Must Read Sections for All
Developers / Custom User Interface Controls and Must Read Sections for All
Developers /Parameter Maps'.
int PortalObjMouseHandler::proc(ViewExp *vpt,int msg, int point, int flags, IPoint2 m, Matrix3&
mat)
{
Point3 p1;
if (msg == MOUSE_FREEMOVE)
vpt->SnapPreview(m,m,NULL, SNAP_IN_3D);
if (msg==MOUSE_POINT || msg==MOUSE_MOVE)
{
switch (point)
{
case 0:
p0 = vpt->SnapPoint(m,m,NULL,SNAP_IN_3D);
mat.SetTrans(p0);
hlfHt = hlfWd = 0.0f;
break;
case 1:
p1 = vpt->SnapPoint(m,m,NULL,SNAP_IN_3D);
break;
}
}
else if (msg == MOUSE_ABORT) return CREATE_ABORT;
return CREATE_CONTINUE;
}
#define SRC_CELL 0
#define DEST_CELL 1
PortalObj *portal;
HWND hDlg; // dialog window
int cell; // which one are we looking
//for - src or dest
// hit testing
BOOL PortalObjPickModeCallback :: HitTest(IObjParam *ip,
HWND hWnd,ViewExp *vpt,
IPoint2 m, int flags)
{
INode *node = ip->PickNode(hWnd, m);
return (node != NULL);
}
portal->DeleteReference(ID_DEST_REF);
portal->MakeRefByID(FOREVER, ID_DEST_REF, node);
}
return TRUE;
}
return TRUE;
}
switch (id)
{
case IDC_SRC:
g_pickModeCB.cell = SRC_CELL;
portal = (PortalObj*)GetWindowLong(hwnd, GWL_USERDATA);
portal->ip->ClearPickMode();
portal->ip->SetPickMode(&g_pickModeCB);
break;
case IDC_DEST:
g_pickModeCB.cell = DEST_CELL;
break;
}
}
BOOL CALLBACK DialogProc(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
BOOL processed = TRUE;
switch (uMsg)
{
HANDLE_MSG(hwnd, WM_INITDIALOG, Cls_OnInitDialog);
HANDLE_MSG(hwnd, WM_COMMAND, Cls_OnCommand);
return processed;
}
return result;
}
In "Subdivision Surface Theory" , I wrote about a number of schemes and those were only the
tip of the iceberg, so it's worth spending some time justifying the choice I've made for this
implementation. Why use the modified butterfly? To explain my reasoning, it helps to look at
more general characteristics of schemes and their advantages and disadvantages. The major
differences tend to hinge on whether a scheme is approximating or interpolating.
Approximating schemes have a number of benefits. The surfaces they produce are generally
very fair, and they are generally the favored schemes for use in high-end animation. For
instance, Pixar uses Catmull-Clark surfaces for their character animation. The downside of
approximating schemes are substantial, though. The major one is that because the scheme
doesn't interpolate its control net, the shape of the limit surface can be difficult to envision from
looking at the control net. The caveat is that as the net becomes denser, the surface will
generally be closer to the net. But for games, the net itself won't be tens of thousands of
polygons, so the surface can differ substantially from the net.
Interpolating schemes are a different story. They can exhibit problems with fairness, with
ripples and undulations over the surface, especially near tight joint areas. Also, they aren't used
in high-end rendering quite as much, which can mean that they're the focus of less research.
But their major benefit is that the surface is substantially easier to envision by looking at the
net. Since the surface passes through all the net vertices, it won't "pull away" from the net. The
fairness issues are the price to pay for this, though. Approximating schemes are fair because
the surface isn't constrained to pass through the net vertices, but interpolating schemes
sacrifice the fairness for their interpolation.
Nonetheless, I feel that the fairness issues present less of a challenge to intuition than an
approximating surface does. For example, in many cases, existing artwork can be used with
interpolating schemes with some minor adjustments to smooth out rippling, whereas adapting
existing polygonal art to be a control net for an approximating scheme is a much more difficult
task.
Among interpolating schemes, the butterfly scheme has a number of things going for it. It's one
of the better-researched schemes. It's also computationally fairly inexpensive. Finally, the
results of subdivision tend to look good and conform fairly well to what intuition would expect.
Therefore, it's my model of choice.
Butterfly in Review
If one endpoint is of valence 6 and the other is extraordinary (not of valence 6) then we use a
special stencil that takes into account just the extraordinary vertex, shown in Figure 2. The
weights are computed as follows:
Here, t0 and t1 are the tangents, N is the vertex valence, and ei is the ith neighbor point of the
vertex in question, where e0 can be any of the points (it doesn't matter where you start) and
the points wind counterclockwise. Crossing the two resulting vectors and normalizing the result
produces the vertex normal.
The idea behind our implementation is, at a high level, very straightforward. Given one control
net, we want some piece of functionality that can take that net and output a more complex net,
a net that has been advanced by a single subdivision step.
That sounds easy enough, right? Unfortunately, that description doesn't translate very directly
to C++ code. So we need to define some of our terms and be more specific. First of all, what's
a control net? We know what it is conceptually, but what kind of data structure is it and how is
it manipulated? After that, of course, we need to define that "black box" bit of functionality that
subdivides the net, and quantify how it works.
To establish our control net data structure, we start with nothing and build our way up as
needed. So, the first thing we need is the base representation that will eventually pass into
OpenGL. That's just a few arrays. We need an array for our vertices, our texture coordinates,
and our colors. Furthermore, we'll need an array of indices into those arrays to define our
faces; every three indices defines a triangle.
If we can do our tessellating with no more than that, then that's great. But chances are we're
going to need to keep around more information than just that. The important thing is that
whatever information is added to the data structure needs to be renewable. That is, since the
process is iterative, the information we have in the simpler net coming in must also exist in the
more complex net coming out, so that we can feed the complex net back in to produce an even
more complex net.
It's worth asking why we'd need more information than just the vertices and faces. After all, if
we need to determine whether one vertex is connected to another by an edge, we can
determine that by looking through the faces. Or if we need to find all the edges, we could just
do that by running through the face list, too. The problem here is in the running time of the
lookups. When we're subdividing an edge, we need to find out a lot of information about nearby
vertices and faces, and we'd like it to be as fast as possible. Regardless of the processor speed,
looking through all the faces to find a vertex's neighbors will be slower than if we have that
information available explicitly. This is because looking through the list of faces takes O(F) time,
where F is the number of faces. On the other hand, if we have the information stored explicitly,
it only takes O(1) time - constant time. That means that as we add more faces to the model,
the former solution takes longer, whereas the latter remains the same speed.
We don't have the information we need to decide what else to add to the control net data
structure, so we'll work on the procedure for subdividing a net and add data to the control net
as necessary.
Our task, then, is this: given a net, we need to subdivide it into a more complex net. Working
from the modified butterfly rules, this is fairly straightforward. We need to add a vertex along
each edge of the net. Then we need to split each face into four faces using the new vertices.
The first step, adding new vertices along each edge, tells us quite a bit about some more
information we'll need in the control net data structure. There's no fast and simple way to find
all the edges unless we store them explicitly. An edge needs to be able to tell us about its end
points since we need to use those in the butterfly stencil for computing the new vertex.
Furthermore, the stencil extends to the end points' neighbors, so the end point vertices need to
know about the edges they're connected to.
The second step, breaking existing faces into new faces, requires that the faces know about
their vertices, which they already do. The faces also need to know about their edges. While they
could find this by asking their vertices for all their edges and fishing through them, that
requires a fair amount more work for every lookup, and so we'll explicitly store with each face
the information about its edges, too.
That increases the load a fair amount. Our data structure now has arrays of vertices, edges,
and faces. Vertices know about their edges, edges know about their vertices, and faces know
about their vertices and edges.
It's worth noting that the data structure we're working with is nothing new and unusual. It's a
specific example of a general data structure known simply as a graph. A graph is anything
composed of vertices connected by edges. For instance, a linked list and a binary tree are both
special kinds of graphs.
What makes our problem a little tougher than, say, writing a singly-linked list class is that the
graph of vertices in a model is considerably more complex than the graph of nodes in a linked
list. First, the nodes in a linked list have a single edge coming out of them (pointing to the next
node) and one coming in (from the previous node.) Our graph has six edges coming into each
regular vertex and potentially many more than that for extraordinary vertices.
Furthermore, in the case of a singly-linked list or a binary tree, the edges have direction. That
is, you don't generally walk backward through the list or up the tree. Furthermore, these
structures are acyclic - there are no "loops" in them - so from a given vertex, there's no path
that leads back to the same vertex. In our case, the edges are undirected. You need to be able
to traverse every edge in both directions.
Discussing graphs in this context is really just "interesting facts" rather than being a crucial
contribution to our implementation, but it confirms what we already know: our data structure is
complicated. The one saving grace is that our algorithm is based on locality, so we don't need
to worry about traversing huge distances across the graph to find information we need to
subdivide. This is one benefit of using a scheme with minimal support. A scheme with much
broader support would be computationally much harder to evaluate, and hence be much slower
and far more difficult to implement.
It also confirms the direction we're taking to implement the data structure - it's based wholly on
locality so that the time it takes to find one vertex given another is proportional to the number
of edges between them. There are other ways of representing graphs for the myriad
applications that have different requirements. Cormen and his co-authors (see For Further Info
at the end of this article) provide an excellent introduction to graph theory.
So we know the data we need in our control net data structure and we know the steps the
tessellation needs to execute. We're ready to dig into the lower-level implementation details.
First, we'll go back to the information in the control net structure and look at how it should be
laid out.
Listing 1 shows the layout of the data. There tend to be two schools of thought on data layout.
One method is dubbed the "structure of arrays" (SOA) and the other is the "array of
structures" (AOS). The idea is that the SOA method stores multiple parallel arrays whereas the
AOS method stores all the data interleaved in the same array. I've personally never run into a
situation where the two approaches differed greatly in speed, and so when I lay out data I
generally try to blend the two approaches for clarity's sake. That's why some of the data in the
listing is shown as separate arrays of base types and some are stored as arrays of small
objects.
The vertices are stored in OpenGL-friendly arrays. While OpenGL allows for interleaved arrays,
many applications tend to store their data in parallel arrays, and that's why I choose to do so as
well. The vertices, texture coordinates, normals, and colors each have their own arrays. These
arrays are dynamically grown; when I need to add another vertex and there isn't sufficient
room, I allocate new arrays that are twice the size of the current ones and move the data into
the new arrays. This strategy amortizes the cost of memory allocation and is one I use for most
of my memory management.
Each vertex also has a VertexEdges associated with it. VertexEdges keeps track of the edges
that the vertex is a part of. Following the theme of making lookups as fast as possible, the
edges are stored sorted by winding order, so each successive edge in the array is the next edge
in counterclockwise winding order from the previous edge.
The edges themselves prefer the AOS format. Each edge is stored as nothing more than two
indices into the vertex arrays. Adding another nitpicking detail, I sort the indices by value. It
comes in handy as there are many cases where I can skip a conditional by knowing that they're
in sorted order.
The faces are stored simply as an array of indices into the vertex arrays, where every three
indices defines a triangle. Since the control net is totally triangular, I don't need any
complicated support for variable-sized faces.
That's it for the storage of the control net. Now we need to understand the details of the
tessellation process.
As mentioned earlier, the subdivision step consists of subdividing edges and then building new
faces from them. The top-level function that does this is shown in Listing 2. For the edge
subdividing, I iterate over the edges. At each edge, I check the valences of the end point
vertices to determine which subdivision rules to use. Upon deciding that, I apply the rules and
produce the new vertex. It's then added to the end of the vertices array.
Furthermore, the edge is split into two edges. One of them uses the slot of the old edge, and
one of them is added to the back of the edge array. For use in building the faces, I keep two
lookup tables. One maps from the old edge index to the index of the new vertex I just created.
The other maps from the old edge index to the index of the new edge that I just added.
Building the faces is somewhat more involved, as it requires a fair amount of bookkeeping when
creating the four new faces to be sure that they're all wound correctly and have their correct
edges. For each face, I have the corner vertices and the edges. From the two lookup tables I
created while subdividing edges, I also know the new vertices and new edges.
I shuffle all that data around to get it in a known order so that I can then build faces out of it. I
also end up adding three more edges connecting the new vertices inside the triangle. Those
new edges need to be added to the new vertices' edge lists, and they need to be added in the
correct winding order. This isn't much code, but it's tricky and bug-prone.
Using this function, I can iterate over that as many times as I like. Each iteration increases the
polygon count by a factor of four. When I decide to stop, only then do I need to worry about
calculating vertex normals. Iterating over the vertices with the modified butterfly tangent mask
finds those handily.
The previously described procedure finds the vertices and normals, but not the colors or texture
coordinates. These deserve their own discussion. Colors are nice because they can be
interpolated using the same scheme as the vertices. If the butterfly scheme produces smooth
surfaces in XYZ space, it will also produce smoothness in RGBA space. It's certainly possible to
linearly interpolate the colors. That will result in colors that don't change abruptly, but whose
first derivative changes abruptly, resulting in odd bands of color across the model, similar to
Gouraud interpolation artifacts.
Texture coordinates are a somewhat more difficult problem. Current consumer hardware
interpolates color and texture over polygons in a linear fashion. For colors, this isn't what we
generally want: Gouraud interpolation of color exhibits significant artifacts. But for texturing, it
is what we want. The texture coordinates should be linearly interpolated, stretching the texture
uniformly across a face.
Therefore, when I interpolate texture coordinates during subdivision, I just linearly interpolate
them. Furthermore, higher-order interpolation doesn't necessarily make sense at all, as
different faces of the control net might have totally different sections of the texture, or even
have totally different textures mapped onto them. While the data structure doesn't currently
support this (vertices would need to be capable of having multiple sets of texture coordinates),
it could certainly be desirable. In this case, neighboring vertices' texture coordinates are in
totally different spaces, so interpolating between them doesn't make sense.
So, I'll stay with linear interpolation for texture coordinates. In terms of elegance, this method
is a little disappointing. If we interpolated everything using the modified butterfly scheme, we
could treat vertices not as separate vertex, color, and texture-coordinate data, but as one nine-
dimensional vector, (x,y,z,r,g,b,a,u,v), and just perform all the interpolation at once. Alas, in
this case, elegance needs to take a back seat to pragmatism.
Now we know how to start with a control net and step forward, producing increasingly detailed
control nets, all the while keeping our data structures intact and keeping our vertices, colors,
and texture coordinates intact, and generating normals for the finished model. What else is
there left to cover?
Animation
While it's beyond the scope of this article to describe how you might implement a full animation
system that uses subdivision surfaces, it's worth describing how subdivision surfaces and
animation can coexist. If your game is one that stores the animated model as a series of full
models, clearly you don't even have to think about it - subdividing those individual meshes will
just work.
Skeletal animation is a somewhat more interesting problem. One of the nice things about
subdivision surfaces is that a skeletal animation system should be able to transform the control
net before subdivision, saving you the cost of multiple-matrix skinning on the high-polygon final
model. This does have some downsides, though. Depending on the model and a host of other
factors, the skeletal animation might cause the model to flex in strange ways or to exhibit
increased rippling or unfairness.
The other downside is that it doesn't allow your application to take advantage of forthcoming
hardware that supports skinning on the card. Depending on the speed of that skinning, though,
and on how many times you subdivide the model, the savings of you doing a reduced number
of transforms may or may not be worth the loss of offloading.
Adaptivity
Since this is a scalable geometry solution, it's worth asking if we can adaptively subdivide based
on curvature or distance to the camera. In my previous Game Developer articles on tessellating
Bézier patches ("Implementing Curved Surface Geometry," June 1999, and "Optimizing Curved
Surface Geometry," July 1999) such adaptivity was a major focus.
The problem with adaptive solutions for subdivision surfaces is that, unlike patches, subdivision
surfaces don't easily expose a closed-form parameterization. The only easy way to tessellate
them is through recursion. So we rely on the fact that as we recurse, we're converging on a
limit surface. And no matter how we tessellate, we should be converging on the same limit
surface.
If we tessellate adaptively, we've changed the control net. Some of the net might be at a higher
level of tessellation than the rest. And so we've broken the rules, and our net is no longer
converging on the same surface. This is a worst case scenario for scalable geometry - it
produces a "popping" that you simply can't avoid, since the underlying surface is now
fundamentally different.
Furthermore, although this could probably be dealt with somehow, would it be worth it?
Consider that a game probably won't be subdividing the control net more than four times. If
your original net is, say, 1,000 polygons, four subdivision steps bring it to 256,000 polygons.
The span of low-end to high-end machines isn't yet quite that large. So the end result of an
elaborate adaptivity scheme would just be a model that was subdivided three times in some
areas, maybe four in others: a whole lot of work for negligible benefits.
If you're using subdivision schemes for characters, then unless your characters are gigantic,
adaptivity based on distance from the camera won't be worth much, either. Plus, characters
tend to be fairly uniformly curved; most of them don't have large flat sections and jagged
spikes in other areas. Therefore, in the end, you might be able to squeeze some benefits out of
an adaptivity scheme, but the amount of work necessary to do so is fairly daunting. It's
probably sufficient to pick a subdivision level based on distance to the camera and field-of-view
angle and tessellate to it.
I'll freely admit that the demo is not at the point where you
could drop it straight into your game and witness a stunning
transformation (unless shiny salamanders are exactly what
your game needs). There's a good deal more to be done with
the demo. For starters, it's worth asking what to do when
even the base control net is too dense. If a character is far
away from the camera, maybe you'd only like to draw a 200-
polygon version? In that case, integrating a separate mesh-
reduction algorithm that you apply to the simplest net when
needed could solve the problem nicely.
Additional Resources
Cormen, T., C. Leiserson, and R. Rivest, Introduction to Algorithms. Cambridge, Mass.: M.I.T.
Press, 1998.
Zorin, D., P. Schröder, and W. Sweldens. "Interpolating Subdivision for Meshes with Arbitrary
Topology." Siggraph '96. pp. 189-192. (Available from ACM Digital Library.)
When he's not sleeping through meetings or plotting to take over the world, Brian's
busy furtively subdividing, hoping one day to develop his own well-defined tangent
plane. Critique his continuity at [email protected].
class ButterflySurface
{
public:
...
protected:
...
// Connectivity information,
// needed for tessellating.
int numEdges;
int edgeCapacity;
ButterflyEdge* edges;
};
class ButterflyEdge
{
public:
bool operator==(const ButterflyEdge& cmp) const;
bool operator<(const ButterflyEdge& cmp) const;
int v[2];
};
tessellateEdges(edgeVertMap, edgeEdgeMap);
buildNewFaces(edgeVertMap, edgeEdgeMap);
delete[] edgeVertMap;
delete[] edgeEdgeMap;
}
At Siggraph '98, Pixar unveiled a short animated film. Christened Geri’s Game, it was, to quote
its Academy Award press release, the “endearing tale of an aging codger who likes to play
chess in the park against himself.” Not only was it artistically stunning, but it was also a
technological powerhouse. The short served as a vehicle to demonstrate Pixar’s latest addition
to its production environment, a surface scheme known as subdivision surfaces.
Subdivision surfaces are a way to describe a surface using a polygonal model. Like the
polygonal model, the surface can be of any shape or size — it’s not limited to a rectangular
patch. Unlike that polygonal model, the surface itself is perfectly smooth. Subdivision surface
schemes allow you to take the original polygonal model and produce an approximation of the
surface by adding vertices and subdividing existing polygons. The approximation can be as
coarse or as detailed as your needs allow. Because Pixar’s rendering system requires everything
to be broken into polygons that are a half-pixel across, subdivision surfaces allowed them to
tessellate automatically to that level everywhere. As such, the artists didn’t need to worry about
how close Geri was to the camera. While your game probably can’t quite deal with half-pixel
polygons, whatever size you do choose, your models can scale up and down in polygon count
with the speed of the machine and their distance from the camera.
The technology itself is, for the most part, not new, but its application up until recently has
been fairly limited. Indeed, Geri’s Game is still one of the only compelling demonstrations of
subdivision surfaces. Nonetheless, it brought attention to subdivision surfaces as a relatively
new, up-and-coming technique for implementing scalable geometry.
Along with Pixar’s work, quite a few researchers are actively tackling issues in the area of
subdivision surfaces, and several Siggraph papers each year advance them academically and
put them to use in solving problems. By now, they are a fairly mature technology, and a
compelling contender among scalability solutions.
The game development community realizes that scalable geometry techniques are an important
part of developing next-generation game engines. The spread between high-end and low-end
hardware seems to get bigger each year (thanks to current and forthcoming geometry
accelerators such as Nvidia’s GeForce 256 and S3’s Savage2000), forcing game developers to
find ways to cater to the masses that use low-end machines while building in features that
make the most of hardcore gamers’ advanced hardware. As a result, on the low end our
engines should still be capable of using fewer than 10,000 polygons per scene, but on the high
end, the sky’s the limit: even hundreds of thousands of polygons per scene can cruise along at
60 frames per second. Scalable geometry techniques such as subdivision surfaces are therefore
necessary to accommodate this variation in hardware capabilities.
First, what is a subdivision surface? The obvious answer is that it’s a surface generated through
subdivision. To elaborate, every subdivision surface starts with an original polygonal surface,
called a control net. Then the surface is subdivided into additional polygons and all the vertices
are moved according to some set of rules. The rules for moving the vertices are different from
scheme to scheme, and it is these rules that determine the properties of the surface. The rules
of most schemes (including all the ones discussed here) involve keeping the old vertices
around, optionally moving them, and introducing new vertices. There are schemes that remove
the old vertices at each step, but they’re in the definite minority.
The one thing the control net and the eventual surface (called the limit surface) have in
common is that they are topologically the same. Topology is a way of describing the structure
of a surface that isn’t changed by an elastic deformation, that is, a stretching or twisting. A
good example and common joke is that to a topologist, a coffee cup and a donut are identical.
The donut hole corresponds to the hole in the handle of the coffee mug. On the other hand, a
sphere and coffee mug are not topologically equivalent, since no amount of stretching and
twisting can punch a hole in that sphere.
Topology is one reason that subdivision surfaces are worth a look. With Bézier or B-spline
patches, modeling complex surfaces amounts to trying to cover them with pieces of rectangular
cloth. It’s not easy, and often not possible if you don’t make some of the patch edges
degenerate (yielding triangular patches). Furthermore, trying to animate that object can make
continuity very difficult, and if you’re not very careful, your model will show creases and
artifacts near patch seams.
That’s where subdivision surfaces come in. You can make a subdivision surface out of any
arbitrary (preferably closed) mesh, which means that subdivision surfaces can consist of
arbitrary topology. On top of that, since the mesh produces a single surface, you can animate
the control net without worrying about seams or other continuity issues.
As far as actual uses in games, I believe that subdivision surfaces are an ideal solution for
character modeling. Environments and other parts of a game generally don’t have the fine
detail or strange topology that would require subdivision surfaces, but characters can have joint
areas that are particularly hard to model with patches, and characters are in constant
animation, which makes maintaining continuity conditions very important.
The basics. Before we start discussing individual schemes, let’s look at the basic characteristics
of subdivision surfaces in general. This gives us a framework for classifying and comparing the
schemes as we come across them. Most of these characteristics carry notable implications with
them, whether they are implied computational costs or implied ease-of-use considerations, or
anything else. These will usually be the criteria on which you might choose one scheme above
another.
Continuity: the holy grail. The first characteristic of a scheme is its continuity. Schemes are
referred to as having Cn continuity, where n determines how many derivatives are continuous.
So if a surface is C0 continuous, it means that no derivatives are continuous, that the surface
itself doesn’t have open holes. If a surface is C1 continuous, it means that the surface is closed
and that its tangents are continuous (so there aren’t any sharp seams).
This probably won’t be a major selling point of one scheme above another, since just about
every scheme has C1 continuity everywhere. Some have C2 continuity in some places, but the
majority have areas where the best they can claim is C1. So most schemes are alike in this
regard.
However, continuity is most certainly worth mentioning because it’s one of the major reasons to
think about using subdivision surfaces in the first place. After all, Pixar could have modeled Geri
using as many polygons as they wanted, since they’re not running their movies in real time. But
no matter how many polygons they used, you could get close enough that Geri’s skin would
look faceted from the polygons. The point of using a subdivision model is that you have that
ideal limit surface at which you can always throw more and more polygons as you get closer
and closer to, no matter how high the display resolution or how close the model is to the
screen. Only a very small portion of the real world is flat with sharp edges. For everything else,
there’s subdivision surfaces.
While the degree of continuity is generally the same for all subdivision schemes, there are a
number of characteristics that vary notably between schemes. One important aspect of a
scheme is whether it is an approximating scheme or an interpolating scheme. If it’s an
approximating scheme, it means that the vertices of the control net don’t lie on the surface
itself. So, at each step of subdivision, the existing vertices in the control net are moved closer
to the limit surface. The benefit of an approximating scheme is that the resulting surface tends
to be very fair, having few undulations and ripples. Even if the control net is of very high
frequency with sharp points, the scheme will tend to smooth it out, as the sharpest points move
the furthest onto the limit surface. On the other hand, this can be to the approximating
scheme’s detriment, too. It can be difficult to work with, as it’s harder to envision the end result
while building a control net, and it may be hard to craft more undulating, rippling surfaces as
the scheme fights to smooth them out.
If it’s an interpolating scheme, it means that the vertices of the control net actually lie on the
limit surface. This means that at each recursive step, the existing vertices of the control net are
not moved. The benefit of this is that it can be much more obvious from a control net what the
limit surface will look like, since the control net vertices are all on the surface. However, it can
sometimes be deceptively difficult to get an interpolating surface to look just the way you want,
as the surface can develop unsightly bulges in areas where it strains to interpolate the vertices
and still maintain its continuity. Nonetheless, this is usually not a tremendous problem.
Figure 1 shows examples of an approximating scheme (on the left) and an interpolating scheme
(on the right). The white outline is the control net, and the red wireframe is the resulting
surface after a few subdivision steps. You can see the difference quite clearly: the
approximating surface seems to pull away from the net, while the interpolating surface flows
through the vertices of the net.
Surfaces in Uniform
Another set of characteristics of a scheme brings in four more terms. A scheme can be either
uniform or nonuniform, and it can be either stationary or nonstationary. These terms describe
how the rules of the scheme are applied to the surface. If the scheme is uniform, it means that
all areas of a control net are subdivided using the same set of rules, whereas a nonuniform
scheme might subdivide one edge one way and another edge another way. If a scheme is
stationary, it means that the same set of rules is used to subdivide the net at each step. A
nonstationary scheme, on the other hand, might first subdivide the net one way, and then the
next time around use a different set of rules.
All the schemes we’ll talk about here are fundamentally both uniform and stationary. There are
some extensions to these schemes that make them nonstationary or nonuniform, but there
aren’t many subdivision schemes that are fundamentally nonstationary or nonuniform. One of
the main reasons for this is that most of the mathematical tools we have for analyzing schemes
are unable to deal with dynamically changing rules sets.
Subdivision Shape
Another characteristic of a scheme, albeit less significant than the prior ones, is whether it is
triangular or quadrilateral. As the names would imply, a triangular scheme operates on
triangular control nets, and a quadrilateral scheme operates on quadrilateral nets. Clearly, it
would be inconvenient if you had to restrict yourself to these primitives when building models.
Therefore, most quadrilateral schemes (including the one discussed here) have rules for
subdividing n-sided polygons. For triangular schemes, you generally need to split the polygons
into triangles before handing them over to be subdivided. This is easy enough to do, but one
downside is that for some schemes, the way you break your polygons into triangles can change
the limit surface. The changes are usually minor, though, so you simply need to be consistent:
if you randomly choose which diagonal of a quadrilateral to split on every frame, you’ll end up
with popping artifacts.
Extraordinary Vertices
The preferred vertex valence is another property of subdivision schemes. The valence of a
vertex is the number of edges coming out of it. Most every vertex a scheme produces during
subdivision has the same valence. Vertices of that valence are the regular vertices of a scheme.
Vertices of any other valence are known as extraordinary vertices. Their effect depends on the
subdivision scheme, but historically there have been problems analyzing the limit surface near
extraordinary vertices. As we look at various schemes, we’ll see the effect that extraordinary
vertices have on each one.
Most schemes don’t ever produce extraordinary vertices during subdivision, so the number of
extraordinary vertices is set by the original control net and never changes. Figure 3 is an
example of two steps of a triangular scheme with an extraordinary vertex in the center. Notice
how it remains the only extraordinary vertex after a step of subdivision. Also note that the
valence of the regular vertices is 6. This is common for triangular schemes, as they all tend to
split the triangles in the same way — by adding new vertices along the edges and breaking
each triangle into four smaller triangles.
Surface Evaluation
Surface evaluation is the process of taking a control net, adding vertices, and breaking faces
into more, smaller faces to find a better polygonal approximation of the limit surface. There are
a number of ways to evaluate a subdivision surface. All subdivision schemes can be evaluated
recursively. Furthermore, most (including all the ones discussed here) can be explicitly
evaluated at the vertex points of the control net. For interpolating schemes, this means that
you can explicitly calculate the surface normals at the vertices using what are called tangent
masks. For approximating schemes it means you can also explicitly calculate the vertex’s limit
position, using what are called evaluation masks. In this context, a mask isn’t the same kind of
mask that you might use during binary arithmetic. Our masks are more analogous to the masks
worn at a masquerade. They are like stencil cutouts, shapes that can be “placed” on the control
net, and their shape determines which of the surrounding vertices are taken into account (and
how much effect each has) in determining the end result, be it the vertex location or its tangent
vectors. Figure 4 shows a visual example of applying a mask to a surface at a vertex.
A Note on Notation
The one problem with a standard notation is that if you access some of the references at the
end of this article, they will very likely use their own, different notation. As long as the concepts
make sense, though, it shouldn’t be difficult to figure out someone else’s naming convention.
The polyhedral scheme is about the simplest subdivision scheme of all, which makes it a good
didactic tool but not the kind of scheme you’d ever actually want to use. It’s a triangular
scheme where you subdivide by adding new vertices along the midpoints of each edge, and
then break each existing triangle into four triangles using the new edge vertices. A simple
example is shown in Figure 5. The problem with this, of course, is that it doesn’t produce
smooth surfaces. It doesn’t even change the shape of the control net at all. But it serves to
demonstrate some concepts fairly well.
The scheme is clearly interpolating since it doesn’t move the vertices once they’re created. It’s
also triangular, since it operates on a triangular mesh. Furthermore, the scheme is uniform
since the edge’s location doesn’t affect the rules used to subdivide it, and stationary since the
same midpoint subdivision is used over and over. The surface is only C0 continuous, since along
the edges of polygons it doesn’t have a well-defined tangent plane. The regular vertices of this
scheme are of valence 6, as that’s the valence of new vertices created by the scheme.
However, this scheme is simple enough that it doesn’t suffer because of its extraordinary
vertices.
The evaluation of the scheme isn’t hard at all. You can evaluate
it recursively using the subdivision rules. As far as evaluation
and tangent masks go, it’s clear that we don’t need an
evaluation mask, since the points are already on the limit
surface. Tangent masks don’t really make any sense, since our
surface isn’t smooth and therefore doesn’t have well-defined
tangents everywhere.
The butterfly scheme has a fairly interesting history to it. In 1990, Dyn, Levin, and Gregory
published a paper titled “A Butterfly Subdivision Scheme for Surface Interpolation with Tension
Control” (see For Further Info at the end of this article). It described the first butterfly scheme.
The title is derived from the stencil, or map of neighbors used during evaluation, which is
shaped like a butterfly (Figure 7). The scheme is interpolating and triangular, so all it ever does
is add vertices along the edges of existing triangles. The rules for adding those vertices are
simple, and the support is compact. For each edge, sum up the vertices in the stencil-shaped
area around that edge, weighting each one by a predetermined weight. The result is the new
vertex. The weights used, corresponding to the vertex labelings in Figure 7, are these:
In this case, w is a tension parameter, which controls how “tightly” the limit surface is pulled
towards the control net — note that if w equals –1/16, the scheme simply linearly interpolates
the endpoints and the surface isn’t smooth.
One question that the scheme doesn’t answer, though, is what to do if the area around an edge
doesn’t look like that butterfly stencil. Specifically, if either of the edges’ endpoints is of a
valence less than 5, there isn’t sufficient information to use the scheme, leaving you with no
choice but to choose w = –1/16 near that area, resulting in a surface that isn’t smooth near
those extraordinary points. This means that while the surface is smooth almost everywhere,
there will be isolated jagged points that really stand out visually and make the surface harder
for an artist to craft.
Figure 8. The 10-point stencil from the
modified butterfly scheme.
In 1993, Dyn and his colleagues extended the butterfly scheme to use a ten-point stencil, so
that the default case was the one shown in Figure 8, similar to the eight-point case with the
rear vertices added in. The new weights are:
Note that by adding w to the d points and subtracting it from the a points, the stencil’s total
weighting still adds up to 1. Intuitively, this is important because it means that the new point
will be in the neighborhood of the ones used to generate it. If the weights summed to, say, 2,
then the point would be twice as far from the origin as the points used to generate it, which
would be undesirable.
This new scheme even reduces to the old scheme as a subset — choosing w = 0 results in the
same rule set as the eight-point butterfly stencil. However, this extension didn’t address the
smoothness problem at extraordinary vertices.
In 1996, Zorin, Schröder, and Sweldens published an extension of the butterfly scheme known
as the modified butterfly scheme. The primary intent of their extension was to develop rules to
use for extraordinary vertices, making the surface C1 continuous everywhere.
The full justification for these weights is available in Zorin’s thesis (see For Further Info at the
end of this aritcle).
If both endpoints of the edge are extraordinary, the vertex is computed by averaging the
results produced by each of the endpoints. So, evaluate the vertex once for each endpoint using
the appropriate weights from above, and average the resulting two candidates.
Those, then, are the rules for recursively evaluating the surface.
Since the scheme is interpolating, you don’t need an evaluation
mask, but it would be nice to have a tangent mask to explicitly
find the tangents at vertices. Such a mask exists, although it’s
fairly lengthy to write out, and not particularly enlightening. It
can be found in Zorin’s thesis, and I’ll discuss it next month
when implementing this scheme.
Catmull-Clark Surfaces
New edges are then formed by connecting each new face point to its adjacent new edge points
and connecting each new vertex point to its adjacent new edge points. This defines the faces as
well, and it brings up an interesting point: consider what happens when you subdivide a surface
with a polygon that is not a quadrilateral. The resulting new face vertex will be connected to k
new edge vertices, and k will not be equal to four. Therefore, the new face vertex is an
extraordinary vertex. This is the only one of the three schemes shown here where the scheme
can actually create an extraordinary vertex during subdivision.
This is not as bad as it may seem, though. After a single subdivision step, all the faces in the
control net are quadrilaterals. Therefore, the scheme can only introduce new extraordinary
vertices during the first subdivision step. After a single subdivision step, the number of
extraordinary vertices is set and will not change.
The scheme also has evaluation and tangent masks for evaluation at the vertices. The full
discussion and proof of the evaluation mask can be found in Halstead et al. and is fairly lengthy.
The mask itself is fairly simple, though. For a vertex of valence N, the mask is equal to:
It’s interesting to note that this mask requires that we’ve subdivided the net once, since it uses
the face and edge vertices of the same level as the corner vertices, and face and edge vertices
are not available in the original control net.
The tangent masks carry an equally lengthy discussion, but their resulting formula is also fairly
complicated. Because most of it can be precomputed for each valence and stored in a lookup
table, it’s not computationally expensive, it’s just a large formula:
The surface normal is then the normalized cross product of t0 and t1.
Figure 13 shows a tetrahedron control net in white with a red wireframe of the surface after a
few subdivision steps of the Catmull-Clark scheme.
Catmull-Clark Extended
Catmull-Clark surfaces hold the distinction of being the favored surfaces for use in high-end
rendering; they were the model employed by Pixar in Geri’s Game. Their mathematical
elegance and the amount of work devoted to them make them a fairly attractive choice. For
instance, work has been done on generating Catmull-Clark surfaces that interpolate a set of
points, which, as an approximating scheme, they do not usually do. Furthermore, Pixar
extended them for Geri’s Game to allow for sharp and semi-sharp creases in the surface.
Pixar’s scheme generating these creases is fairly straightforward. It allows an artist to specify
for an edge or vertex that subdivision near that edge or vertex should be done sharply (using
polyhedral subdivision) for some number of steps, from 0 to infinity. Intuitively, the more sharp
steps that are used, the more creased the surface will appear near that edge. If the number is
finite, then the surface will still be smooth, since eventually the surface will resume using the
normal Catmull-Clark subdivision rules. If the crease is infinitely sharp, it isn’t smooth at all.
Pixar put these to use on Geri’s skin features, adding creases to various locations across his
body like between his skin and fingernails.
It’s worth noting that while this greatly extends the application of the surfaces, it changes the
properties of the scheme. The scheme becomes both nonuniform, since different edges and
vertices can be of differing degrees of sharpness, and nonstationary, because a semi-sharp
crease is evaluated linearly for some number of steps and then smoothly for the rest. Near the
creases, the surface no longer reduces to the B-spline surface, and it also invalidates the
evaluation and tangent masks.
Geri’s Game clearly demonstrates the benefit of sharp and semi-sharp creases. However, for
use in games, the evaluation and tangent masks are fairly important, and so it’s difficult to say
whether the increased computational cost is worth the added functionality.
After this whirlwind tour of subdivision surfaces, you might be feeling a little light-headed or
dizzy. Hopefully though, you’ve picked up the concepts behind subdivision surfaces and maybe
even thought of some good applications for them in projects you’re working on or getting ready
to start. Since there’s nowhere near enough space to discuss implementation details for even
just these three schemes, next month we’ll bear down and focus on one of them, the modified
butterfly scheme. I’ll mention the reasons I think it’s a good choice for use in games, discuss
some of the benefits and detriments, and then present an example implementation.
Acknowledgements
Thanks to Pixar for graciously allowing us to use images from their short animation, Geri’s
Game. Thanks also to Denis Zorin for his suggestions and references, Jos Stam at
Alias|Wavefront for his help and suggestions, and to Alias|Wavefront for allowing him to release
his precomputed eigenstructures. Thanks to Chris Goodman of 3dfx for discussions, latté, and
those hard-to-find papers, and to Adrian Perez of Carnegie-Mellon University for suggesting the
subdivision scheme I eventually settled on.
For Further Info
• Catmull, E., and J. Clark. “Recursively Generated B-Spline Surfaces on Arbitrary Topological
Meshes.” Computer Aided Design, 1978.
• DeRose, T., M. Kass, and T. Truong. “Subdivision Surfaces in Character Animation.” Siggraph
‘98. pp. 85–94.
• Dyn, N., J. A. Gregory, and D. A. Levin. “Butterfly Subdivision Scheme for Surface
Interpolation with Tension Control.” ACM Transactions on Graphics. Vol. 9, No. 2 (April 1990):
pp. 160–169.
• Dyn, N., S. Hed, and D. Levin. “Subdivision Schemes for Surface Interpolation.” Workshop in
Computational Geometry (1993), A. C. et al., Ed.,” World Scientific, pp. 97–118.
• Halstead, M., M. Kass, and T. DeRose. “Efficient, Fair Interpolation Using Catmull-Clark
Surfaces.” Siggraph ‘93. p. 35.
• Stollnitz, E., T. DeRose, and D. Salesin. Wavelets for Computer Graphics. San Francisco:
Morgan-Kaufman, 1996.
• Zorin, D., P. Schröder, and W. Sweldens. “Interpolating Subdivision for Meshes with Arbitrary
Topology.” Siggraph ‘96. pp. 189–192.
Geometric Modeling
http://muldoon.cipic.ucdavis.edu/CAGDNotes
When he's not sleeping through meetings or plotting to take over the world, Brian's
busy furtively subdividing, hoping one day to develop his own well-defined tangent
plane. Critique his continuity at [email protected].
The "Graphic Content" column in Game Developer follows the erratic path of a professional
computer graphics developer, namely me. Anyone who has ever been in a professional
production situation realizes that real-world coding these days requires a broad area of
expertise. When this expertise is lacking, developers need to be humble enough to look things
up and turn to people around them who are more experienced in that particular area.
My problem right now is facial animation. In particular, I need to know enough in order to
create a production pathway and technology to display real-time lip sync. My first step when
trying to develop new technology is to take a historic look at the problem and examine previous
solutions. The first people I could think of who had explored facial animation in depth were the
animators who created cartoons and feature animation in the early days of Disney and Max
Fleischer.
Facial animation in games has built up on this tradition. Chiefly, this has been achieved through
cut-scene movies animated using many of the same methods. Games like Full Throttle and The
Curse of Monkey Island used facial animation for their 2D cartoon characters in the same way
that the Disney animators would have. More recently, games have begun to include some facial
animation in real-time 3D projects. Tomb Raider has had scenes in which the 3D characters
pantomime the dialog, but the face is not actually animated. Grim Fandango uses texture
animation and mesh animation for a basic level of facial animation. Even console titles like
Banjo Kazooie are experimenting with real-time “lip-flap” without even having a dialog track.
How do I leverage this tradition into my own project?
Each mouth position or viseme represented one or more phonemes. This reference chart
became a standard method of creating animation. As a game developer, however, I am
concerned with the number of positions I need to support. What if my game only has room for
eight visemes? What if I could support 15 visemes? Would it look better?
Throughout my career, I have seen many facial animation guidelines with different numbers of
visemes and different organizations of phonemes. They all seem to be similar to the Disney 12,
but also seem like they involved animators talking to a mirror and doing some guessing.
I wanted to establish a method that would be optimal for whatever number of visemes I wanted
to support. Along with the animator’s eye for mouth positions, there are the more scientific
models that reduce sounds into visual components. For the deaf community, which does not
hear phonemes, spoken language recognition relies entirely on lip reading. Lip-reading samples
base speech recognition on 18 speech postures. Some of these mouth postures show very
subtle differences that a hearing individual may not see.
So, the Disney 12 and the lip reading 18 are a good place to start. However, making sense of
the organization of these lists requires a look at what is physically going on when we speak. I
am fortunate to have a linguist right in the office. It’s times like this when it helps to know
people in all sorts of fields, no matter how obscure.
Science Break
The field of linguistics, specifically phonetics, compares phonemes according to their actual
physical attributes. The grouping does not really concentrate on the visual aspects, as sounds
rely on things going on in the throat and in the mouth, as well as on the lips. But, perhaps this
can help me organize the phonemes a bit.
Sounds can be categorized according to voicing, manner of articulation (airflow), and the places
of articulation. There are more, but these will get the job done. As speakers of English, we
automatically create sounds correctly without thinking about what is going on inside the mouth.
Yet, when we see a bad animation, we know it doesn’t look quite right although we may not
know why. With the information below, you will be equipped to know why things look wrong.
Now for some group participation. This is an interactive article. Go on, no one is looking. The
categories we want to examine are:
Voiced vs. Voiceless. Put your hand on your throat and say something. You can feel an
intermittent vibration. Now say, “p-at, b-at, p-at, b-at,” (emphasizing the initial consonant).
Looking at the face, there is no visual difference between voiced and voiceless sounds. In some
sounds the vocal cords are vibrating together (b-voiced) and in some the vocal cords are apart
(p- voiceless). This is an automatic no-brainer as far as reducing sounds into one viseme. Any
pair of sounds that is only different because of voicing can be reduced to the same viseme. In
English, that eliminates eight phonemes.
Nasal vs. oral. Put your fingers on your nose. Slowly say “momentary.” You can feel your nose
vibrating when you are saying the “m.” Some sounds are said through the nasal cavity, but
most are said through the oral cavity. These are also not visibly different. So again, we have an
automatic reduction in phonemes. All three nasal sounds in English can be included in the oral
viseme counterpart.
Manners of Speech. Sounds can also be differentiated by the amount of opening through the
oral tract. These also do not offer a visible clue, but are very important for categorizing
phonemes. Sounds that have complete closure of the airstream are called stops. Sounds that
have a partially obstructed closure and turbulent airflow are called fricatives. A sound that
combines a stop/fricative is called an affricate. Sounds that have a narrowing of the vocal tract,
but no turbulent airflow, are called approximates. And then there are sounds that have
relatively no obstruction of the airflow; these are the vowels.
According to the chart, there are three bilabials, which are sounds made with both lips. They
are [b], [p], and [m]. According to the Figures 3a, 3b, and 3c they have different attributes
inside the mouth. B and P only differ in that the B makes use of the vocal cords and P does not.
The M sound is nasal and voiced so it is similar to the B sound, but it is a nasal sound. The cool
thing about these sounds is that while there are differences inside the mouth, visually there is
no difference. If you look in a mirror and say “buy,” “pie,” and “my” they all look identical. We
have reduced three phonemes into one viseme.
While you’re working, remember that you are thinking with respect to sounds (phonemes), not
letters. In many cases a phoneme is made up of multiple letters. So, if we go through Chart 1,
we can continue to reduce the 35 phonemes into 13 visemes. For the most part, the visemes
are categorized along the lines of the Places of Articulation (with the exception of [r]).
Take a look at the following listing of visemes. It describes the look of each phoneme in
American English. The only phoneme not listed is [h]. “In English, ‘h’ acts like a consonant, but
from an articulatory point of view it is simply the voiceless counterpart of the following
vowel.” (Ladefoged, 1982:33-4). In other words, treat [h] like the vowel that comes after it.
Visemes
To see how helpful this information can be when animating a face take a word like “hack.” It
has four letters, three phonemes, and only two visemes (13 and 9 in the listing).
Say that you don’t have enough space to include 13 visemes and whatever emotions you want
expressed. Well, by using Chart 1 and the list of visemes in the listing, you can make logical
decisions of where to cut. For example, if you only have room for 12 visemes, you can combine
viseme 5 and 6 or 6 and 7 below. For 11 visemes, continue combining visemes by incorporating
viseme 7 and 9 below. For 10, combine visemes 2 and 3. For 9, combine 8 with the new viseme
7/9. For 8, combine 11 and 13.
If I were really pressed for space, I could keep combining and drop this list down further. Most
drastic would be three frames (Open, Closed, and Pursed as in boot) or even a simple two
frames of lip flap open and closed. In this case you would just alternate between opened and
closed once in a while. But that isn’t very fun or realistic, is it?
Art Issues
However, even with the viseme count I am still not ready to set Side view of the sound
the artists loose creating my viseme frames. There are several [m], as in “my.”
basic engine decisions that I must make before modeling begins.
Unfortunately, I will have to wait until the next column to dig into that. Until then, think back on
my 3D morphing column (“Mighty Morphing Mesh Machine,” December 1998) as well as last
year’s skeletal deformation column (“Skin Them Bones,” Graphic Content, May 1998) and see if
you can get a jump on the rest of the class.
Acknowledgements
Special thanks go to my partner in crime, Margaret Pomeroy. She was able to explain to me
what was really going on when I made all those funny faces in the mirror. When she was
studying ancient languages in school I am sure she never imagined working on lip-synching
character dialog.
• Ladefoged, Peter. A Course in Phonetics. San Diego: Harcourt Brace Jovanovich, 1982.
• Maestri, George. [digital] Character Animation. Indianapolis: New Riders Publishing, 1996.
• Parke, Frederic I. and Keith Waters. Computer Facial Animation. Wellesley: A. K. Peters,
1996.
Jeff Lander often sounds like he knows what he’s talking about. Actually, he’s just lip-
synched to someone who really know what’s going on. Let him know you are on to
the scam at [email protected].
This article is dedicated to the memory of Seumas McNally. Please see the epilogue at the end
of this article.
Like many people, I find photographs of rolling hills or perilous canyons both calming and awe-
inspiring. It is unfortunate that as gamers, we are not able to revel in the natural beauty of the
outdoors. Only a few current and upcoming games give us this feast for the eyes (Tribes 1 & 2,
Tread Marks, Outcast, Myth 1 & 2, and HALO are a few examples). These games have taken 3D
action gaming to the next level with the inclusion of incredibly detailed worlds upon which the
story and action are played out.
In this article I will briefly examine the state of the art in hardware accelerated landscape
engines and the algorithms which power them. One algorithm in particular will be presented,
discussed, and finally implemented as a starting point for anyone looking to add landscapes to
their next project. I'll assume an intermediate level of C++ knowledge and at least general
knowledge of 3D rendering.
It seems you can't shake a stick in the world of terrain visualization without hitting a reference
to Level of Detail (LOD) Terrain Algorithms. Level of Detail algorithms use a set of heuristics to
determine which parts of a landscape need more detail to look correct. You'll no doubt find a
slew of references to Height Fields and GPS datasets too. All of this is perpetrated by the
military SimNet applications, however it is now finding its way into more trivial persuits.
One of the many technical challenges to terrain rendering is how to store the features inherent
in a landscape. Height Fields are the de-facto standard solution. Simply put, they are two-
dimensional arrays which hold the height of the terrain at that point. Think of a piece of graph
paper where a surveyor has filled in each square using his altitude measuring tools. Height
Fields are sometimes called Height Maps, I will use the two terms interchangeably.
A good overview of LOD Terrain Algorithms can be represented by three papers [1 Hoppe][2
Lindstrom][3 Duchaineau]. In [1], Hugues Hoppe presents an algorithm based on Progressive
Meshes, a relatively new and spiffy technique for adding triangles to arbitrary meshes as you
need more detail. The paper is an excellent read but a bit complex and has high memory
requirements for our needs.
The second paper [2] is more our style, Lindstrom et. al. present a structure called a Quad Tree
that is used to represent a patch of landscape. A Quad Tree recursively tessellates the
landscape creating an approximation of the Height Field. Quad Trees are very simple and
efficient, sharing many of the design principles of the next algorithm (such as recursion),
however the added bonuses of the next paper tilt the scale.
Finally, in [3] Duchaineau et.al. present an algorithm (Real-time Optimally Adapting Meshes)
based on a Binary Triangle Tree structure. Here each patch is a simple isosceles right triangle.
Splitting the triangle from its apex to the middle of its hypotenuse produces two new isosceles
right triangles. The splitting is recursive and can be repeated on the children until the desired
level of detail is reached.
The ROAM algorithm caught my eye while researching due to its simplicity and extensibility.
Unfortunately the paper is extremely short and only minimal pseudocode is presented to hint at
implementations. However, it can be implemented from the most basic level up to the most
advanced optimizations in a nearly-continuous spectrum. This is helpful since each step can be
validated before continuing. Also, ROAM tessellates very rapidly and allows dynamic updates to
the Height Map.
The engine presented here was patterned after the engine in Tread Marks
(http://www.TreadMarks.com). The lead programmer, Seumas McNally, was instrumental from
its conception to completion. See the Acknowledgments at the end for more info.
The code in the archive is written for Visual C++ 6.0 and uses OpenGL to perform the
rendering. I am new to OpenGL, but I have used every available means to code this aspect of
the project correctly. Comments and suggestions on the engine's design or implementation are
welcome.
The project contains several files that are not covered in this explanation. These files consist of
utility routines and general application overhead needed to run an OpenGL/Win32 application.
Only "ROAMSimple.cpp" and associated header files are examined here.
Let me introduce the algorithm with a bird's-eye view and then we can focus on how the
individual pieces interact:
z Height Map files are loaded into memory and associated with an instance of a Landscape
class. Multiple Landscape objects may be linked to generate terrains of infinite size.
z A new Landscape object parcels out sections of the loaded Height Map to new Patch class
objects. The purpose for this step is two-fold:
1. The tree-based structures used for the rest of the algorithm expand RAM usage
exponentially with depth, so keeping the areas small limits their depths.
2. Dynamic updates of the Height Field need a complete recalculation of the variance
tree over the modified locations. Overly large Patches would be too slow to
recompute in a real-time application.
z Each Patch object is then called to create a mesh approximation (tessellation). The
Patches employ a structure called a Binary Triangle Tree which stores implicit coordinates
for the triangles that will be displayed onscreen (instead of explicit X,Y,Z coordinates). By
storing the vertices in a logical manner, ROAM saves upwards of 36 bytes of RAM per
triangle. Coordinates are calculated efficiently as part of the rendering step (below).
z After tessellation, the engine traverses the Binary Triangle Tree created in the previous
step. Leaf nodes in the tree represent triangles which need to be output to the graphics
pipeline. The triangle coordinates are calculated on the fly during the traversal.
Instead of storing a huge array of triangle coordinates to represent the landscape mesh, the
ROAM algorithm uses a structure called a Binary Triangle Tree. This structure can be viewed as
the result of a surveyor cutting the landscape into triangular plots. The owners of these plots
logically view each other in terms of neighbor-relationships (left/right neighbor, etc). Likewise,
when an owner gives land as an inheritance, it is split equally between the two children.
To extend this analogy further, the original owner of a plot is the root node of a Binary Triangle
Tree. Other original owners are root nodes of their own trees. The Landscape class acts like a
local land-registry, keeping track of all the original owners and which plot they owned. The
registry also keeps records of all inheritances from parents to children.
The more generations of children, the more heavily surveyed the land becomes. Any amount of
detail can be produced simply by expanding the 'population' in areas which need better
approximations. See Figure 1 for an example.
Binary Triangle Trees are represented by the TriTreeNode structure and keep track of the five
basic relationships needed for ROAM. Refer to Figure 2 for the standard view of these
relationships.
struct TriTreeNode {
TriTreeNode *LeftChild; // Our Left child
TriTreeNode *RightChild; // Our Right child
TriTreeNode *BaseNeighbor;
// Adjacent node, below us
TriTreeNode *LeftNeighbor;
// Adjacent node, to our left
TriTreeNode *RightNeighbor;
// Adjacent node, to our right
};
Figure 2. Basic binary triangle with children and
neighbors.
When creating a mesh approximation for the Height Field, we will recursively add children to
the tree until the desired level of detail is reached. After this step is complete, the tree can be
traversed again, this time rendering the leaf nodes as actual triangles onscreen. This two-pass
system is the basic engine, and requires resetting for each frame. One nice feature of the
recursive method is that we are not storing any per-vertex data, freeing up huge amounts of
RAM for other goodies.
In fact, the TriTreeNode structures are created and destroyed so many times that the most
efficient method of allocation is mandated. Also, there may be tens of thousands of these
structures, so even one extra pointer would bloat the memory requirements tremendously. The
TriTreeNode structures are allocated from a static pool, bypassing the overhead of dynamic
memory allocation, which also gives us a rapid method for resetting the state.
The Landscape class acts as the high-level encapsulator for the dirty details of landscape
rendering. From the point of view of the application, the landscape should simply appear in the
screen buffer after a few simple setup calls. Here's the important bits of the Landscape class
definition:
class Landscape {
public:
void Init(unsigned char *hMap);
// Initialize the whole process
void Reset();
// Reset for a new frame
void Tessellate();
// Create mesh approximation
void Render();
// Render current mesh static
TriTreeNode *AllocateTri();
// Allocate a new node for the mesh
protected:
static int m_NextTriNode;
// Index to the next free TriTreeNode
static TriTreeNode m_TriPool[];
// Pool of nodes for tessellation
Patch m_aPatches[][];
// Array of patches to be rendered
unsigned char *m_HeightMap;
// Pointer to Height Field data
};
The Landscape class manages large square plots and can work together with other Landscape
objects each with their own plots. This design comes into play later when you'll want to page-in
larger terrain sets. During initialization, the Height Map is cut into more manageable pieces and
given to new Patch objects. It is the Patch class and associated methods that we will spend the
most time on.
Note the simplicity of the functions. The Landscape class is designed to be easily dropped into a
rendering pipeline -- especially given the gratuitous hardware z-buffering available these days.
Several globals are used to further simplify this demo.
The Patch class is the meat & potatoes of the engine. It is roughly broken into two halves, the
stub half and the recursive half. Here's the data declaration and stub half of the Patch class:
class Patch {
public:
void Init( int heightX, int heightY,
int worldX, int worldY,
unsigned char *hMap);
// Initialize the patch
void Reset();
// Reset for next frame
void Tessellate();
// Create mesh
void Render();
// Render mesh void
ComputeVariance();
// Update for Height Map changes
...
protected:
unsigned char *m_HeightMap;
// Adjusted pointer into Height Field
int m_WorldX, m_WorldY;
// World coordinate offset for patch
TriTreeNode m_BaseLeft;
// Root node for left triangle tree
TriTreeNode m_BaseRight;
// Root node for right triangle tree
...
In the flow of code, the stub functions explained below are called for each Patch object held by
the parent Landscape. The Patch class method names are equivalent to the Landscape methods
which call them. These methods are rather simplistic so there is no need for a detailed analysis:
Init() requires the offsets into the Height Field array and World. These are used for scaling the
patch over different sizes of terrain. The pointer to the Height Field is adjusted to point to the
first byte of this patch's data and stored internally.
Reset() erases any references to invalid TriTreeNodes, followed by relinking the two Binary
Triangle Trees that make up each patch. It hasn't been mentioned until now, but each patch is
actually made up of two discrete Binary Triangle Trees fitted together into a square (called a
'Diamond' in the ROAM paper). Take a look at Figure 2 again if you're confused. Much more
detail on this in the next section.
Tessellate() is the first of our stub functions. It simply passes the proper parameters for the
highest level triangles (the two root nodes from each patch) on to the recursive version of the
function. Same goes for Render() and ComputeVariance().
ROAM Guts
So far we've only discussed the support structure for the actual algorithm. Now it's time to get
to the goods. It might be handy to have a copy of the ROAM paper at this point, but I'll explain
it as we go. Refer back to Figure 2 with the triangle relationships, and stele yourself for the next
phase.
First we must define a metric for visible error in a mesh approximation. The method I use is a
clone of the Tread Marks engine called 'Variance'. We will need such a metric for deciding when
a node should be split (to add detail), and how deeply to split it. The ROAM paper uses a metric
based on nested world- space bounds. While this metric is more accurate, it is also vastly
slower.
Variance is the difference in height of the interpolated hypotenuse midpoint for a binary triangle
node and the actual Height Field sample at that point. Simply put, how far off is the current
binary triangle node from the actual Height Field area it covers. This calculation is relatively
quick and only requires one memory hit for the Height Field lookup:
But wait, there's more! We can't just calculate the variance for the two root Binary Triangle
Trees of each Patch because the error associated with this calculation is too high. It has to be
calculated deeper into the tree, then averaged back up to get a better estimate. The depth of
this calculation for the demo can be specified at compile time.
Normally, the variance calculation would be required for each frame, however it won't change
unless the underlying Height Field changes. Therefore we introduce a Variance Tree which
works alongside the Binary Triangle Tree.
A Variance Tree is a full-height binary tree written into a sequential array. A few simple macros
allow us to navigate the tree efficiently, and the data we fill it with is a single byte value of
difference per node. Refer to Figure 4 if you've not encountered this structure before. Two
variance trees are stored in the patch class, one each for the Left & Right Binary Triangles. [
Now we can get back to the job of creating an approximate mesh. Given our error metric
(variance), we will decide to split the Binary Triangle node over a particular spot if its variance
is too high. That is, if the terrain under the current triangle is bumpy, then we should split it to
give a better approximation. Splitting entails creating two child nodes that exactly fill the parent
triangle's area (see Figure 1 for an example).
After moving down to the children, we repeat the process. The variance roughly drops in half
each iteration. At some point we either find smooth enough terrain to approximate with a single
triangle, or we run out of 'steps'.. after all we can only create meshes down to the resolution of
the Height Field, no more.
[Figure 5]
There's still one more complication. When splitting Binary Triangle Trees that are adjacent on
the landscape, cracks will often appear in the mesh. These cracks are due to uneven splitting of
the trees across patch boundaries. This problem is illustrated in Figure 6.
It comes down to a simple rule: Only split if the current node and its Base Neighbor both point
to each other (see Figure 7). This relationship is referred to as a Diamond. It is special because
a split of one node in a Diamond can be mirrored by the other without causing cracks in the
mesh.
1. The Node is part of a Diamond - Split the node and its Base Neighbor.
2. The Node is on the edge of the mesh - Trivial, only split the node.
3. The Node is not part of a Diamond - Force Split the Base Neighbor.
A Forced Split is a recursive traversal of the mesh which ends when it finds a Diamond or an
edge triangle. Here's how it works: When splitting a node, check to see that it is part of a
Diamond first. If not, call a second split operation on the Base Neighbor to create a Diamond,
then continue with the original split.
The second call to split will do the same check, and recurs the process again if need be. Once a
node is found that can be split legally the recursion unwinds, splitting nodes along the way.
Figure 8 illustrates this.
Figure 8. Forced split operation
So let's review. Given a patch made up of two Binary Triangle Trees covering a particular area
of the Height Field, we will perform the following operations:
1. Compute Variance Tree - Fill out a full-height binary tree with variance data for each
Binary Triangle Tree. 'Variance' is the metric we are using to determine if our
approximation is good enough. It is the difference between the height sample at the
middle of the hypotenuse versus the interpolated height from the two points which border
the hypotenuse.
2. Tessellate the Landscape - Using the variance tree we will split our Binary Triangle Trees
by adding children if the variance of the top level is undesirably high.
3. Forced Splits - If the node we are attempting to split is not part of a Diamond, then call a
Forced Split on the offending node. This will give us a Diamond to complete the original
split operation.
4. Repeat - The tessellation step is repeated on the children until all the triangles in the
Binary Triangle Tree are under the variance limit for the current frame - or until we run
out of nodes in our allocation pool.
Now that we have all the details of the ROAM algorithm, let's finish up the Patch class
implementation. All of the recursive functions (except Split) take coordinates for the triangles
they represent. These coordinates are calculated on the stack and passed down to the next
level, or given to OpenGL for rendering. Even at the deepest level of the Binary Triangle Tree,
there are rarely more than thirteen triangles on the stack.
This is the basic algorithm for recursion that the following functions use:
RecurseComputeVariance() takes the full set of coordinates for the current triangle and a few
extra bits of info to keep track of where we are. Variance for the triangle is calculated and
combined with that of its children. I chose to pass in the height for each point as well as its X &
Y coordinates in order to reduce the memory hits on the Height Field array.
RecurseRender() is remarkably simple, but take a look at the Triangle Fanning optimization
under Advanced Topics for the next step up from here. Basically, if the current triangle is not a
leaf node, recurs into the children. Otherwise, output a single triangle to OpenGL. Note that the
OpenGL rendering is not optimized, but rather designed for maximum readability. That's all
folks! We've covered everything you'll need to understand the code. The rest is icing for those
who want to take the next step. But first, I'll give some engine qualifiers, and a note on the
variance calculations.
Platform: Win98, AMD K6-2 450 Mhz, 96 Mb RAM, NVIDIA GeForce 256 DDR video.
Resolution: 640x480, 32 bit color
Note on Variance: Variance is the single most important variable in this engine. It has been
glossed over in this article to present the overall framework. Try modifying the calculation used
for the variance tree, or the tree's depth. Specifically, set depth to an extremely small value like
3, then try a large one like 13, note the difference in rendering quality.
The variance is also used during tessellation. Adjust the calculation for the current node to
ignore distances entirely. Then try it based purely on distance, ignoring the value from the
variance tree. It will be up to you to find calculations which work best for your application.
In the process of writing this article, I contacted several developers in industry for comments
and poked a few questions about their engines.
Starseige Tribes is an online-only game of fast-paced squad warfare. The game is played
seamlessly between indoor and outdoor environments where terrain features are extremely
strategic to the success of a mission. Long distance kills are commonplace and enemies may
hide behind hills to avoid detection. Mark Frohnmayer responded to questions about Starseige
Tribes for this article.
On Height Maps: [responses edited for length] Mark Frohnmayer: "The heights in Tribes 1 are
stored on a regular 8m square grid. Tribes 2 grid size is selectable by mission."
On the Engine: Mark Frohnmayer: "The Tribes 1 terrain engine used a distance-based quad
tree algorithm created by Tim Gift. For Tribes 2 we came up with a new approach to screen-
error based on edge traversal. Our approach makes up for the limitations of current published
algorithms, including texturing with a bin-tree approach and seaming up edges between
squares in quadtree algorithms."
They chose a Quad Tree algorithm, see [2] for reference on the basics of this system. I'm
puzzled over the new screen error metric, however from the screenshots of Tribes 2, it works
amazingly well! Perhaps in the coming months they will enlighten us with more details.
Other Comments: Mark Frohnmayer: "Texturing has been the single biggest headache with
landscapes in the game - allowing the mission editor to select a texture for every square and
dynamically generate the combination textures for squares that are at a lower detail level, as
well as automatically texturing the terrain based on vertex material. The most interesting
problem was coming up with the algorithm to detail the terrain mesh that solved all of our
requirements."
I mentioned before the shear depth of the landscape visualization field, and Mark's comment
hints at this. The included engine is only a drop in the bucket. Be sure to check out a good
technical library or the many online sites devoted to this topic.
Mark goes on to say that their engine is not frame-coherent, requiring a rebuild of the Quad
Tree each frame. This is required for their view metric and clipping code.
Outcast (www.Outcast-thegame.com)
Outcast is a deeply designed action/adventure game with exquisite role-play elements. The
unique look and feel of Outcast plays beautifully into the alien landscapes and immersive
storyline. Christophe Chaudy responded to questions about Outcast for this article
On Height Maps: [responses edited for clarity & length] Christophe Chaudy: "We are using
tiled heightmaps. Each heightmap's instance contains its specific scale, offset and some color
modifiers. With only 150 or 200 unique heightmaps we can build a huge world with more than
3600 tiles of 10x10 meters. It was difficult to deal with map continuity at tile edges but that's
the price to pay - and our graphics people rock!"
On the Engine: Christophe Chaudy: "We started with the software voxel engine. During the
production of Outcast, seeing the explosion of 3D HW market, we looked at polygon terrain
rendering algorithms but:
z We were not able to recreate the complexity, diversity and density of voxel terrain with
polygons
z We need to output A LOT of polygons to achieve a valid terrain quality (too many for
1998-99 3D HW)
z In Outcast, we used a lot of post process algorithms which operate directly on the
frame/Z buffer to render special effects like fog, water, & shadows. These techniques
could not be easily implemented on 1999 3D HW.
z We didn't have time to rebuild a completely different render system
So, we stayed with the software rendering approach. There were a lot of drawbacks but finally,
even if the terrain renderer is not perfect, it's something that looks different And that was a
very important criteria in Outcast production."
Other Comments: Christophe Chaudy: "Voxel rendering is very constraining; very CPU
intensive, strong modeling constraints, low resolution, poor quality on low-end computer, etc.
But the landscape in Outcast is unique. Even today, I don't see a polygon engine that can
reproduce such geometric complexity. But, under the market pressure, we are designing a
totally new system for outdoor visualization. It will of course use today 3D hardware rendering.
In the future, when the CPU is faster, the ray-casting algorithm on voxel heightmaps could
come back."
Outcast is by far the most unique looking game on the market. If you have not seen or
experienced the world of Outcast, make sure to swing by the homepage. Also take a look at the
presentation on the rendering technology of Outcast from last year's Game Developer's
Conference (see the Bibliography for a link).
Advanced Topics
As promised, here are a few hints and tips for advanced optimizations and features. Each could
be its own article, so I've attempted to distill the most important aspects into a few short
paragraphs on each topic.
Triangle Fanning:Triangle Fanning is an optimization you can use when triangles all share a
central vertex. It allows you to specify fewer vertexes for the same number of triangles, giving
an improvement in overall speed. Triangle Fans in OpenGL flow clockwise, as do the points in
each of the triangles. You will have to switch which side is the triangle face or OpenGL will cull
out all your triangles!
In order to get the correct output of triangles, it helps to switch the order in which child nodes
are visited at each level of rendering. Thus if we visit the Left Child first at level 1, then visit the
Right Child first for level 2, then back to the Left Child for level 3.
The order of the vertexes is important too. The first vertex specified must be the central point
around which the other triangles 'fan' out. This is done by passing down a reference to one of
the triangle's vertexes as the "best center vertex". At each level, this value is switched to point
to a new best vertex. When a leaf is found it is added to a small buffer of vertexes with the
"best vertex" first and the others in clockwise order.
At the next leaf node, we need only compare the "best vertex" to the first vertex in the buffer.
If they differ, output the fan to OpenGL and start over. However, if the two vertexes are equal,
then test the last vertex in the buffer equal to the next clockwise vertex in the triangle. Again, if
they differ, output the fan to OpenGL and start over. Otherwise append the last vertex of the
triangle to the end of the vertex buffer.
Fan lengths cannot be more than 8 triangles using this method, however average lengths are
more commonly 3-4 triangles per fan.
GeoMorphing:An unfortunate side effect of rendering with dynamic levels of detail is the
sudden visual 'pop' that occurs when triangles are inserted or removed from the mesh. This
distortion can be reduced to nearly unnoticeable amounts by Vertex Morphing, also called
GeoMorphing. GeoMorphing is the gradual rise or fall of a vertex's height from the un-split
position to its new split position over the course of several frames. \
GeoMorphing is not difficult, but has a lot of tricky aspects. Essentially, a value may be stored
in the TriTreeNode during tessellation which contains the amount of 'morph' this triangle has.
This morph value should be in the range 0.0 - 1.0. Then during rendering, transmute from the
interpolated height value to the actual Height Field value using the following function:
MorphedZ = (fMorph * actualZ) + ((1-fMorph) * interpolatedZ);
Frame Coherence:Frame Coherence is the most advanced optimization under ROAM. With
frame coherence, the mesh which was created last frame can be used again. This feature also
enables dynamic frame timing, allowing you to continue to improve the mesh for the current
frame right up to the frame's deadline.
In a fast-action game, this means you don't have to spend all the overhead to tessellate the
landscape. Instead, deal with the most important fast-action components first, then tessellate
the landscape for the rest of the frame time, and render what you have at the end. So if a
player is in the middle of a firefight, the landscape will dynamically render at lower detail to
save time.
It is beyond the space for this article to explain the implementation of Frame Coherence.
However, a few tips for the traveler: Add a 'Parent' pointer to TriTreeNode. Create a Merge()
function which undoes one Split() operation. Use a priority queue or other priority structure
which contains all leaf nodes in the entire mesh. During the tessellation, merge any nodes
which are too detailed for this frame followed by splitting all the nodes which are too coarse for
the frame (or until time runs out).
Supporting Larger Topologies: The included engine is structured to simplify the creation of
very large worlds. By loading separate height maps for each Landscape class and then
rendering each Landscape, there is no limit to its size! There are other limits however, like RAM
and computational power.
The Landscape class was designed to hold a paged-in piece of the world, along with other
Landscape classes holding other blocks. Each Landscape must link its patches to those in the
other Landscapes nearby. This is done in Patch::Reset(), instead of setting the Neighbor
pointers for edge nodes to NULL, lookup the correct patch in the Landscape which borders that
side.
The future of landscape rendering is wide open. No doubt the polygon count will continue to rise
as will the detail of environments and the distances to be viewed. Also, the current LOD
algorithms are not designed to take advantage of the new graphics cards which offload triangle
setup calculations. This reduces the algorithm's gains for certain applications.
Additionally, OpenGL display lists might be used to render an entire landscape, then sent to the
graphics card in one fell swoop each frame. This is feasible for small terrains like this demo and
the faster memory busses of the future. We may even see a re-emergence of software rendered
voxel landscapes, given the availability of fast CPUs and the inherent advantages of voxel
displays.
A demo of some of the concepts discussed in this article can be found here.
Acknowledgments
This article was inspired by many people and many projects I have seen. First and foremost are
Seumas McNally and the Tread Marks engine, from which this project was modeled after. Visit
http://www.longbowdigitalarts.com/ to join in the lively programming forum or learn more
about Tread Marks.
I would also like to thank the many terrain visualization projects in the public domain, including
ROAM.C by C. Cookson. Also, the many great programming articles in the Gamasutra Features
archive, and the super-programmers of the Gamasutra Connections board (I didn't even have
to ask questions, the answers were already there!).
And to the industry gurus who reviewed this article and made suggestions, thanks again for the
input. It is my sincerest desire to see more outdoor games and epic journeys in the coming
years. I hope this article may inspire new projects for the genre.
2. Lindstrom, P., Koller, D., Ribarsky, W., Hodges, L., Faust, N., Turner, G., "Real-Time
Continuous Level of Detail Rendering of Height
Fields" (http://www.cc.gatech.edu/gvu/people/peter.lindstrom/papers/siggraph96)
3. Duchaineau, M., Wolinski, M., Sigeti, D., Miller, M., Aldrich, C., and Mineev-Weinstein, M.
"ROAMing Terrain: Real-time Optimally Adapting Meshes" (http://www.llnl.gov/graphics/ROAM)
Epilogue
On Tuesday, March 21st 2000, Seumas McNally lost his battle with Hodgkins Lymphoma.
My sincerest condolences go out to his surviving family; Jim, Wendy and Philippe. I never had
the chance to meet Seumas personally, nor thank him for the encouragement and free
exchange of ideas he gave. His passing is a great loss to our community of developers. May his
commitment, determination, and unfailing humanity live on as an example for us all. Goodbye,
Seamus - and Thank You.
Since the advent of computer games, programmers have continually devised ways to simulate
the world more precisely. Pong, for instance, featured a moving square (a ball) and two
paddles. Players had to move the paddles to an appropriate position at an appropriate time,
thus rebounding the ball toward the opponent and away from the player. The root of this basic
operation is primitive(by today’s standards) collision detection. Today’s games are much more
advanced than Pong, and most are based in 3D. Collision detection in 3D is many magnitudes
more difficult to implement than a simple 2D Pong game. The experience of playing some of the
early flight simulators illustrated how bad collision detection can ruin a game. Flying through a
mountain peak and surviving isn’t very realistic. Even some recent games have exhibited
collision problems. Many game players have been disappointed by the sight of their favorite
heroes or heroines with parts of their bodies inside rigid walls. Even worse, many players have
had the experience of being hit by a rocket or bullet that was “not even close” to them. Because
today’s players demand increasing levels of realism, we developers will have to do some hard
thinking in order to approximate the real world in our game worlds as closely as possible.
This article will assume a basic understanding of the geometry and math involved in collision
detection. At the end of the article, I’ll provide some references in case you feel a bit rusty in
this area. I’ll also assume that you’ve read Jeff Lander’s Graphic Content columns on collision
detection (“Crashing into the New Year,” ; “When Two Hearts Collide,”; and “Collision
Response: Bouncy, Trouncy, Fun,” ). I’ll take a top-down approach to collision detection by first
looking at the whole picture and then quickly inspecting the core routines. I’ll discuss collision
detection for two types of graphics engines: portal-based and BSP-based engines. Because the
geometry in each engine is organized very differently from the other, the techniques for world-
object collision detection are very different. The object-object collision detection, for the most
part, will be the same for both types of engines, depending upon your current implementation.
After we cover polygonal collision detection, we’ll examine how to extend what we’ve learned to
curved objects.
To create an optimal collision detection routine, we have to start planning and creating its basic
framework at the same time that we’re developing a game’s graphics pipeline. Adding collision
detection near the end of a project is very difficult. Building a quick collision detection hack near
the end of a development cycle will probably ruin the whole game because it’ll be impossible to
make it efficient. In a perfect game engine, collision detection should be precise, efficient, and
very fast. These requirements mean that collision detection has to be tied closely to the scene
geometry management pipeline. Brute force methods won’t work — the amount of data that
today’s 3D games handle per frame can be mind-boggling. Gone are the times when you could
check each polygon of an object against every other polygon in the scene.
Let’s begin by taking a look at a basic game engine loop (Listing 1). A quick scan of this code
reveals our strategy for collision detection. We assume that collision has not occurred and
update the object’s position. If we find that a collision has occurred, we move the object back
and do not allow it to pass the boundary (or destroy it or take some other preventative
measure). However, this assumption is too simplistic because we don’t know if the object’s
previous position is still available. You’ll have to devise a scheme for what to do in this case
(otherwise, you’ll probably experience a crash or you’ll be stuck). If you’re an avid game player,
you’ve probably noticed that in some games, the view starts to shake when you approach a wall
and try to go through it. What you’re experiencing is the effect of moving the player back.
Shaking is the result of a coarse time gradient (time slice).
Another approach, which is easier to implement but less accurate, is to subdivide the given time
interval in half and test for intersection at the midpoint. This calculation can be done recursively
for each resulting half, too. This approach will be faster than the previous methods, but it’s not
guaranteed to catch all of the collisions.
(N choose 2) comparisons. Thus, the number of comparisons that we’ll need to perform is of
order N2 (or O(N2)). But we can avoid performing O(N2) pair-wise comparisons in one of
several ways. For instance, we can divide our world into objects that are stationary (collidees)
and objects that move (colliders) even with a v=0. For example, a rigid wall in a room is a
collidee and a tennis ball thrown at the wall is a collider. We can build two spatial trees (one for
each group) out of these objects, and then check which objects really have a chance of
colliding. We can even restrict our environment further so that some colliders won’t collide with
each other — we don’t have to compute collisions between two bullets, for example. This
procedure will become more clear as we move on, for now, let’s just say that it’s possible.
(Another method for reducing the number of pair-wise comparisons in a scene is to build an
octree. This is beyond the scope of this article, but you can read more about octrees in Spatial
Data Structures: Quadtree, Octrees and Other Hierarchical Methods, mentioned in the “For
Further Info” section at the end of this article.) Now lets take a look at portal-based engines
and see why they can be a pain in the neck when it comes to collision detection.
But what if we use this imprecise method as simply a first step. We represent a whole character
as one big sphere, and then check whether that sphere intersects with any other object in the
scene. If we detect a collision and would like to increase the precision, we can subdivide the big
sphere into a set of smaller spheres and check each one for collision (Figure 4). We continue to
subdivide and check until we are satisfied with the approximation. This basic idea of hierarchy
and subdivision is what we’ll try to perfect to suit our needs.
“Axis-aligned” refers to the fact that either the box is aligned with the world axes or each face
of the box is perpendicular to one coordinate axis. This basic piece of information can cut down
the number of operations needed to transform such a box. AABBs are used in many of today’s
games; developers often refer to them as the model’s bounding box. Again, the tradeoff for
speed is precision. Because AABBs always have to be axis-aligned, we can’t just rotate them
when the object rotates — they have to be recomputed for each frame. Still, this computation
isn’t difficult and doesn’t slow us down much if we know the extents of each character model.
However, we still face precision issues. For example, let’s assume that we’re spinning a thin,
rigid rod in 3D, and we’d like to construct an AABB for each frame of the animation. As we can
see, the box approximates each frame differently and the precision varies (Figure 6).
So, rather than use AABBs, why can’t we use boxes that are arbitrarily oriented and minimize
the empty space, or error, of the box approximation. This technique is based on what are called
oriented bounding boxes (OBBs) and has been used for ray tracing and interference detection
for quite some time. This technique is not only more accurate, but also more robust than the
AABB technique, as we shall see. However, OBBs are lot more difficult to implement, slower,
and inappropriate for dynamic or procedural models (an object that morphs, for instance). It’s
important to note that when we subdivide an object into more and more pieces, or volumes,
we’re actually creating a hierarchical tree of that starting volume.
Our choice between AABBs and OBBs should be based upon the level of accuracy that we need.
For a fast-action 3D shooter, we’re probably better off implementing AABB collision detection —
we can spare a little accuracy for the ease of implementation and speed. The source code that
accompanies this article is available from the Game Developer web site. It should get you
started with AABBs, as well as providing some examples of source code from several collision
detection packages that also implement OBBs. Now that we have a basic idea of how everything
works, let’s look at the details of the implementation.
Building Trees
Creating OBB trees from an arbitrary mesh is probably the most difficult part of the algorithm,
and it has to be tweaked and adjusted to suit the engine or game type. Figure 7 shows the
creation of successive OBBs from a starting model. As we can see, we have to find the tightest
box (or volume, in the case of 3D) around a given model (or set of vertices).
There are several ways to precompute OBBs, and they all involve a lot of math. The basic
method is to calculate the mean of the distribution of vertices as the center of the box and then
calculate the covariance matrix. We then use two of the three eigenvectors of the covariance
matrix to align the box with the geometry. We can also use a convex hull routine to further
speed up and optimize tree creation. You can find the complete derivation in the Gottschalk,
Lin, and Manocha paper cited in the “For Further Info” section.
Building AABB trees is much easier because we don’t have to find the minimum bounding
volume and its axis. We just have to decide where to split the model and we get the box
construction for free (because it’s a box parallel with the coordinate axes and it contains all of
the vertices from one side of the separating plane).
So, now that we have all of the boxes, we have to construct a tree. We could use a top-down
approach whereby we begin with the starting volume and recursively subdivide it. Alternatively,
we could use a bottom-up approach, merging smaller volumes to get the largest volume. To
subdivide the largest volume into smaller ones, we should follow several suggested rules. We
split the volume along the longest axis of the box with a plane (a plane orthogonal to one of its
axes) and then partition the polygons based upon which side of the partitioning axis they fall
(Figure 7). If we can’t subdivide along the longest axis, we subdivide along the second longest.
We continue until we can’t split the volume any more, and we’re left with a triangle or a planar
polygon. Depending on how much accuracy we really need (for instance, do we really need to
detect when a single triangle is collided?), we can stop subdividing based on some arbitrary rule
that we propose (the depth of a tree, the number of triangles in a volume, and so on).
As you can see, the building phase is quite complex and involves a considerable amount of
computation. You definitely can’t build your trees during the run time — they must be
computed ahead of time. Precomputing trees eliminates the possibility of changing geometry
during the run time. Another drawback is that OBBs require a large amount of matrix
computations. We have to position them in space, and each subtree has to be multiplied by a
matrix.
Now, let’s assume that we have either our OBB or AABB trees. How do we actually perform
collision detection? We’ll take two trees and check whether two initial boxes overlap. If they do,
they might intersect, and we’ll have to recursively process them further (recursive descent). If,
along the descent, we find that the subtrees do not intersect, we can stop and conclude that no
intersection has occurred. If we find that the subtrees do intersect, we’ll have to process the
tree until we hit its leaf nodes to find out which parts overlap. So, the only thing we have to
figure out is how to check whether two boxes overlap. One of the tests that we could perform
would be to project the boxes on some axis in space and check whether the intervals overlap. If
they don’t, the given axis is called a separating axis (Figure 8).
Figure 8. Separating axis (intervals
A and B don’t overlap).
To check quickly for overlap, we’ll use something called the Separating Axis Theorem. This
theorem tells us that we have only 15 potential separating axes. If overlap occurs on every
single separating axis, the boxes intersect. Thus, it’s very easy to determine whether or not two
boxes intersect.
Interestingly, the time gradient problem mentioned earlier could easily be solved by the
separating axis technique. Remember that the problem involved determining whether a collision
has occurred in between any two given times. If we add velocities to the box projection
intervals and they overlap on all 15 axes, then a collision has occurred. We could also use an
structure that resembles an AABB tree to separate colliders and collidees and check whether
they have a possibility of collision. This calculation can quickly reject the majority of the cases
in a scene and will perform in an O(N logN) time that is close to optimal.
BSP (Binary Space Partitioning) trees are another type of space subdivision technique that’s
been in use for many years in the game industry (Doom was the first commercial game that
used BSP trees). Even though BSP trees aren’t as popular today as they have been over the
past couple of years, the three most licensed game engines today — Quake II, Unreal, and
Lithtech — still use them quite extensively. The beauty and extreme efficiency of BSP trees
comes to light when we take a look at collision detection. Not only are BSP trees efficient for
geometry culling, we also get very efficient world-object collision almost for free.
The BSP tree traversal is the fundamental technique used with BSPs. Collision detection
basically is reduced to this tree traversal, or search. This approach is powerful because it rejects
a lot of geometry early, so in the end, we only test the collision detection against a small
number of planes. As we’ve seen before, finding a separating plane between two objects is
sufficient for determining that those two objects don’t intersect. If a separating plane exists, no
collision has occurred. So, we can recursively traverse a world’s tree and check whether
separating planes intersect the bounding sphere or bounding box. We can increase the accuracy
of this approach by checking for every one of the object’s polygons. The easiest way to perform
this check is to test whether all parts of the object are on the same side of the plane. This
calculation is extremely simple. We can use the Cartesian plane equation, ax + by + cz + d =
0, to determine the side of the plane upon which the point lies. If the equation is satisfied, then
our point lies on the plane. If ax + by + cz + d > 0, then the point is on the positive side the
plane. If ax + by + cz + d < 0, then the point is on the negative side the plane.
The only important thing to note is that for a collision not to occur, all of the points of an object
(or a bounding box) have to be on either the positive or the negative side of a given plane. If
we have points on both the positive and negative side of the plane, a collision has occurred and
the plane intersects the given object.
Unfortunately, we have no elegant way of checking whether a collision has occurred in between
the two intervals (although the techniques discussed at the beginning of this article still apply).
However, I have yet to see another structure that has as many uses as a BSP tree.
Now that we’ve seen two approaches to collision detection for polygonal objects, lets see how
we can compute the collision of curved objects. Several games will be coming out in 1999 that
use curved surfaces quite extensively, so the efficient collision detection of curved surfaces will
be very important in the coming year. The collision detection (which involves exact surface
evaluation at a given point) of curved surfaces is extremely computationally intensive, so we’ll
try to avoid it. We’ve already discussed several methods that we could use in this case, as well.
The most obvious approach is to approximate the curved surface with a lowest-tessellation
representation and use this polytope for collision detection. An even easier, but less accurate,
method is to construct a convex hull out of the control vertices of the curved surface and use it
for the collision detection. In any case, curved surface collision approximation is very similar to
general polytope collision detection. Figure 9 shows the curved surface and the convex hull
formed from the control vertices.
If we combined both techniques into a sort of hybrid approach, we could first test the collision
against the hull and then recursively subdivide the patch to which the hull belongs, thus
increasing the accuracy tremendously.
Now that we’ve gone over some of the more advanced collision detection schemes (and some
basic ones, too), you should be able to decide what type of system would best suit your own
game. The main thing you’ll have to decide is how much accuracy you’re willing to sacrifice for
speed, simplicity of implementation (shorter development time), and flexibility.
• H. Samet. Spatial Data Structures: Quadtree, Octrees and Other Hierarchical Methods.
Addison Wesley, 1989.
• For more information about AABBs take a look at J. Arvo and D. Kirk. “A survey of ray tracing
acceleration techniques,” An Introduction to Ray Tracing. Academic Press, 1989.
• For a transformation speedup, check out James Arvo’s paper in Andrew S. Glassner, ed.
Graphics Gems. Academic Press, 1990.
Nick Bobic is trying not to work 14 hours a day with very little success. Any new
collision tips and tricks should be sent to mailto:%[email protected].
while(1){
process_input();
update_objects();
render_world();
}
update_objects(){
for (each_object)
save_old_position();
calc new_object_position
{based on velocity accel. etc.}
if (collide_with_other_objects())
new_object_position = old_position();
{or if destroyed object remove it etc.}
I’ve been describing methods of dynamic simulation using mass and spring systems for the past
couple of months. These techniques dramatically increase the realism in your real-time graphic
simulation. One of dynamic simulation’s key benefits is that it creates a scaleable game
experience.
Users with more powerful systems get a more realistic experience, while users with less
powerful systems are still provided with a complete experience. It’s a situation analogous to the
use of levels of detail in your 3D models. Particularly in the PC market, where target systems
can vary widely, these techniques have become a crucial weapon in the developer’s arsenal.
For a current project, I decided to maximize the use of dynamics to increase realism wherever
possible. The project focuses on characters in moody interior environments. It occurred to me
that the use of cloth animation in my scenes would be crucial to creating the mood I was trying
to establish.
Most loose clothing doesn’t look natural in digital art because it’s static.
It doesn’t move along with the body. It’s possible to morph the shape of
the skirt to match the motion of the character, but this requires quite a
bit of detailed animation work. Likewise, deforming the skirt with a bone
The devil wears an system can be effective, but not necessarily realistic.
animated-cloth
blue dress. For my work, I wanted to create realistic cloth in the environments and
on the characters. My hardware accelerated graphics rasterization freed
the processor power necessary to make this possible. So, I set about creating a real-time cloth
simulation.
The Latest Springy Fashions
The mass and spring dynamics simulation I developed in a previous column (“Collision
Response: Bouncy, Trouncy, Fun) proved effective for simulating soft body objects in real time.
I thought it should be possible to use these techniques to create a cloth simulation. In fact,
several of the commercial cloth animation systems for 3D animation programs such as 3D
Studio Max, Softimage, and Maya use similar techniques. So how do I go about creating a piece
of cloth?
I am going to be using the same spring force formulas for the cloth simulation as the ones I
used in the March column. If you are unfamiliar with the dynamic forces generated by springs,
you should go back and read the March column or at least take a look at the March source code
on the Game Developer web site (http://www.gdmag.com).
This new form of cloth works pretty well hanging from hooks on the wall. However, if you drop
the cloth on the floor, it wads up into a big mass of springy spaghetti. The reason for this failure
is that the model is still incomplete. If you look at the structure in Figure 2A, you may see that
there is nothing to keep the model from folding along the edges of the structural springs, much
as you fold a handkerchief. The fibers that comprise actual cloth run the length of the fabric and
generally resist folding and bending. In order to simulate this effect adequately, I need to do a
little more work.
My research uncovered two methods for dealing with this problem. The first minimized the bend
between two adjacent cells by using the dot product to determine the angle of bend. The
second method simply added an extra set of springs called flexion or bend springs to apply the
bend force. I created the bend springs by stretching a spring across two cells alongside the
structural springs. These springs end up connecting every other cell in the cloth mesh.
Stiff springs can make a numerical simulation unstable. To combat this, it’s important to use a
good numerical integrator. The midpoint method and Runge-Kutta integrators developed last
month seem to do the trick nicely.
Making It Move
I already have a simulator from the March column that is capable of handling a cloth patch. I
can even apply gravity to it and lock the position of individual vertices. That’s pretty interesting,
but it needs some improvement to come alive. In March, I also discussed the use of planes for
collision. With this same method, I can create collision boxes that enable me to simulate a
tablecloth draped over a table, as you see in Figure 4.
This model is interesting and realistic looking but not terribly animated. In fact, in this case it’s
probably better to freeze the simulation and avoid the constant recalculation. Unless, of course,
the wind kicks up or someone pulls on the corner.
Another problem is that each spring acts independently. This means that each spring can be
stretched to a great extent. In many cases, the amount of stretch can exceed 100 percent. This
is not very realistic. Actual fabric will not stretch in this manner. The problem I have is that I
am using linear springs when fabric actually displays a nonlinear spring behavior. As the
amount of stretch increases, the strength of the spring increases also. The fabric will also
stretch to some limit and then if the force continues, it will rip. This is not what I want (at least
for now). This issue, which Xavier Provot (see For Further Info) calls “the Super-Elastic Effect,”
is difficult to handle. Increasing the spring strength dynamically can lead to instability problems
just like any other stiff spring problem. Provot suggests checking the amount of stretch in each
spring, and if it exceeds a set deformation limit, the springs are adjusted to achieve this limit.
While I agree this solves a definite problem, a second pass through the springs is costly. For the
effects I have attempted to achieve, I can live with super-elastic cloth.
My collision system is pretty primitive. To make things easy, I only collide the vertices of the
mesh with the objects. As it stands, if a sphere is small or the fabric stretches too much, the
sphere will pass right through it. I also don’t handle self-collisions. That is, the fabric can pass
through itself without penalty. This could be corrected by placing bounding spheres at each
vertex. However, applying the sphere collision test between each vertex gets expensive. So, I
just limit the situation so that either the cloth doesn’t pass through itself, or so the effect isn’t
too noticeable.
Once the system is working, it’s fun to see how it can be extended. I mentioned the issue of
tearing and ripping after the fabric stretches too far. I can monitor the spring lengths. If they
exceed a limit, the spring can be removed from the system, effectively tearing the fabric. I
think this would be a great way to simulate a cannonball tearing through the mainsail of a tall
ship. This same method of breaking a spring would work for a simulation of a rope as well. After
all, a rope is really just a one-dimensional version of the cloth patch.
Another dynamic effect can be achieved by manipulating the flexion springs. With these springs
in place, the fabric will resist folding. However, if I selectively delete one of these springs, the
fabric will be able to fold nicely where the springs are missing. I don’t know where I can use
that yet, but I’m sure I can find a way.
The Application
The application this month was actually pretty easy to build. It’s essentially the same as last
month’s application, but with a few additions. There’s a function that creates the cloth patch in
a sort of macro fashion. You can set the spring settings for the three types of springs. You can
also drop some collision objects around and watch them interact. Find the application and the
source at http://www.gdmag.com.
• Baraff, David and Andrew Witkin. “Large Steps in Cloth Simulation,” Proceedings of SIGGRAPH
1998, ACM SIGGRAPH, pp. 43-54. There are also fabric simulations available for many
professional 3D animation packages available either as plug-ins or integrated into the software.
I do not know what techniques these products use with the exception of one. Colin Withers of
Topix created a fabric simulation for Softimage based on the Provot paper. Graciously, Topix
released the source code for this plug-in to the public. See http://www.topix.com/ for more
info.
Jeff Lander prefers to wear comfortable loungewear when hanging out writing code
at Darwin 3D. Drop him a note and let him know what the fashion conscious are
wearing this spring at [email protected].
I had the occasion a few months ago to climb up onto one of my favorite soapboxes and do a
little preaching. The topic centered around code reviews. I was not surprised to find that few
had ever heard of this religion but were filled with questions, wonderment, and most anxious to
hear the word and see revelations unfold before them.
Not to worry, this is not Dianetics for the software guru but rather one small development
technique, a process that can have immense impact on the quality of software you produce as
well as offer broad, positive, and measurable effects for an entire development team -- and
across numerous products. So this article is aimed not just at programmers but at their
managers and their manager’s managers.
I attend a game developer’s user group known as Sputnik which meets once a month in
Seattle. It is a casual and friendly gathering of game making enthusiasts composed of
professionals and beginners, programmers, artists, sound junkies, and the occasional suit. The
forum is relaxed and opened to spontaneous tangents and hence it generally covers any and all
aspects of the game industry (at least once a meeting!). It’s a great learning experience with a
tight finger on the pulse of game development.
This one particular evening we were sharing the secrets and ills and complaints and joys of past
or present projects -- thus we were requested to and happily volunteering to air dirty laundry in
public. It was great fun and quite educational. But what struck me, and it often does, is that
this is an incredibly young industry rot with inexperience. It seems void or somewhat slow to
adopt practices that prove nearly essential in other closely related industries.
How many post-mortems have we read touting the god-send of RCS (or Source Safe) like it’s
the newest thing? How many times has the Design Document been referred to as the golden
book of guidance and everyone should have one? How often do we hear that a Bug Tracking
Database was essential and something we should all try? Ever heard of MS Project, Gantt
charts, the critical path, or the black art of scheduling? How about meta-meetings or meta-
designs? Code reviews? Anyone, anyone? Beuller?
Okay, I don’t mean to be insulting but I am plainly one of those that say the game industry
needs to grow up a little, at least in its processes and methodologies. These are all staples used
throughout the software world. And yet, here I sat amongst some of the best game developers
in the world and though many seem to comprehend the gains and benefits of employing design
and code reviews, it came across that virtually none had ever experienced these processes first
hand.
The Questions
During the course of sharing secrets and dirty laundry of past and present projects, many real
problems were discussed. Common things like what do you do when you lose a programmer or
the lead programmer? Do you just throw their code away? Is a team better off tossing it or
trying to decipher megabytes of source code? How do you integrate a replacement team
member? What can a company do to prepare for (what mistakenly seems to be) the natural
migration of game programmers? How can we stop writing and re-writing the same piece of
technology? How do we thwart the programmer egos that bury projects? How do we train
beginners and hackers? Is there hope that a software team can actually survive two projects?
How can I, as a programmer, grow?
The Answer
Code review.
Actually, let me broaden the answer somewhat. Any of the discussion about code review can
easily be applied to the design review process, they really work hand in hand. In fact, its
difficult for me to conceive one without the other.
Software development, ideally realized, is a cyclic process of refinement and risk reduction.
Most projects tend to revisit the design phase during code implementation for various reasons
(I view this as actually a good thing). With the reality of milestones, deliverables, demos, and
skittish management, as well as lack of complete insight and all encompassing fore-thought, it
is the nature in most software projects to have design and implementation occurring in tandem
(not to forget occasional visits to analysis-land and bouts of incremental testing).
But for our purposes, I’ll focus on what we seem to love most, and that’s code.
So what happens when a programmer leaves a project? Why is it so painful? And is code review
really the answer?
Yes! It dices, it slices, its gets the stain out, comes in various scents and flavors, and it even
does Windows. Code review is just the product you’ve been looking for! So pick up a box today
and let your programmer exits not delay!
So why is it so painful when a programmer leaves a project, especially one that is a major
contributor? Consider the typical game developer modus operandi: design a little, avoid
meetings, do good work, do it individually, and don’t waste time. An obvious recipe for success,
that is, if you’re looking for really stressful days if even one individual leaves your team.
What can a code review process do to help? First off, the process breaks down barriers and
encourages sharing. In fact, that’s the essence of code review: open the box, share the
knowledge, gain from the experience. No more code ownership. That’s a key element and it
deserves more explanation below. But, whoa shit! No more code ownership!! I just lost half the
Gamasutra.com readership, didn’t I?
What do you mean show people my code? What do you mean no more code ownership? How
am I going to know to who blame for all these bugs?
That’s a scary concept for a lot of programmers, the one about showing people your software. I
mean, its your software. You probably never had anyone really look at your code except for a
few teachers in high school or college. And I mean really look at your code. When was the last
time you sat in a group of people who were critiquing your work, right in front of you, down to
the most minute detail, and you were happy and thankful about it? If you work in the game
industry, you probably have never heard of that and think its totally absurd.
Ah, but think about it for a second. If all these people are looking at, inspecting, and digesting
my code, then when it doesn’t work (or has a few minor deficiencies, a-hem), who do we get to
blame? Well crap, it’s not just my fault, by myself, this time. It’s everyone’s fault -- at least
everyone that reviewed and approved the code. Maybe there is something to giving up code
ownership.
Software development really isn’t about whom to blame when things don’t work. But its one
concrete example that actually gets people to consider giving up sole ownership of code.
Whatever works.
Exposure
The idea behind a good code review process is that the software is exposed. It is examined at
various levels, dissected for various objectives, and scrutinized for all manner of quality (or lack
of). The purpose is to recognize defects through inspection. The process is designed to bring the
cumulative skill and experience of the entire programming team (over time) to bare on any and
every piece of the software. It is the means to harness the strength of the collective and wash
away the weaknesses of the individual. A little Trekkie philosophy can go a long way. The Borg
have it figured out.
The code review forum also provides an ideal learning environment. Where do programmers
today go to learn new tricks, refine their methods, test their prowess, and grow their skill?
Primarily its through individual effort, i.e. time-consuming (even career-consuming) trial and
error software exploits. Programmers constantly re-write the same technology so they can learn
from the experience. Through code review, experience can be passed on, much faster and with
much greater variety. And it’s a repetitive process, one that re-enforces good programming
practices, one that allows you to teach as well as learn, and it can actually be fun! Yes, its true,
code review, a meeting (for god’s sake), can actually be fun. I’ve seen it, I’ve lived it, it really
works.
The Mechanics
I was not terribly amazed when I was asked (and later confronted to elaborate) on how to do a
code review. As I alluded to earlier, this is a well-understood process that no one has done
before. It took a lot of brave effort, in the friendly gathering of Sputnik, for someone to actually
admit they didn’t really know how code reviews work.
I have worked in a variety of software industries ranging from games to military flight
simulators to aircraft navigation systems and to boring stuff like bank statement report
generators and configuration management tools. And each experience where code review was
part of the development life-cycle, there was always success, of varying degrees to be sure, but
each project reaped benefits. And notice, I did say “success” as each job gained through use of
code review and none suffered. More importantly, however, each team member gained in skill,
in capacity, in proficiency, in creativity, in interactive skills, and in modesty.
For the description of the mechanics, I’m going to draw on my most successful work experience
where a large group of young engineers incorporated code (and design) review as part of the
daily diet. Yep, that’s daily. Imagine a process where you meet (eek!) nearly everyday and it
saves time!
Here’s how it works in summary: someone (in our case often two someones) writes a small pile
of software according to some documented set of requirements (the output of a previous design
review), compiles it and performs some initial testing, then prints out onto paper (this is very
important) copies of the source code for each reviewer and gives the reviewers a day or two to
review it. At a scheduled meeting, a small group of multi-talented individuals gather and point
out problems, list things to be changed, ask and answer questions, and ultimately (by iteration
of code-test-and-review) the implementation is approved according to a well-defined set of
criteria. Sounds simple enough. Its repetitive and it gets very mechanical (and that’s a good
thing).
The Players
The Implementor: This is the individual who’s software is being reviewed. I’ll belay the
obvious code ownership question for the moment.
The Reviewer(s): Typically this is more than one person, the number varying depending on
the software’s content and its overall importance. The number of reviewers also varies
depending on how many iterations a piece of code has been through, generally the number
drops as the software matures. Reviewers are programmers, designers, managers, senior and
junior people alike.
The Software Manager: In game parlance, that would be either the Lead Programmer or the
Technical Director whose responsibility it is to track and schedule code development and the
review process. Note that the software manager is not the sole reviewer though they tend to be
involved in most reviews.
The Materials
The Source Code: This is the source code, batch files, grammars, external references (books,
technical papers), test cases, anything relevant to the application’s functioning, creation, or
validation. And it needs to be distributed on paper. Yes, kill a few trees, save your project
(that’s from a tree-loving Oregonian who knows a renewable resource when he sees one).
A Red Pen: Each reviewer must use a red pen to mark corrections on paper. Legible, pertinent
comments are the goal since a meeting (ha!) is not always required for an effective code
review, but good ink is.
The Approval Criteria: This consists of coding standards, list of abbreviations, documented
coding idioms, validation tests, and possibly a Software Quality Assurance Manual. As I’ll point
out below, how pedantic you want to be is up to each group, but regardless, there must be a
well understood set of criteria by which the reviewers can test the software against.
Anyone can schedule a code review. Generally its at the behest of the Software Manager. But
when is that?
Typically, we found that 500, one thousand, or as much as two thousand lines of new code
marks a good time for a review. The initial review needs to be meaty enough to represent all
the complexities of the software component under development. At the same time, the team
needs to be careful that the approach used in the code is scrutinized before it becomes too
large and difficult to change. This is an iterative process, so take it in small bites. How much
code to review during each iteration will vary by group, by subject, and the skill level of the
implementor.
It was common during my most successful project for each programmer to have some subset of
their software under review at least once a week. Two weeks between reviews wasn’t
uncommon for any particular software unit. But at three weeks, the implementor themselves
would be screaming for a review. Getting good, timely feedback is essential to the entire
process and coding in isolation for too long is dangerous.
If an individual seems to have a particularly sticky algorithm, can’t solve a problem, can’t
decide an implementation approach, or has a history of defect creation -- schedule them more
often. The review can be used to solve problems as well as to identify them.
As a software unit matures, meaning that its functionality is nearing completion, test suites are
firming up, and the software has been reviewed a number of times, we found that reviews
occurred more often with the benefit that the review time dropped dramatically. Also, with each
iteration, the number of reviewers necessary to get good coverage declined. A final review
might consist of the implementor and the Software Manager putting a checkmark on a schedule
sheet.
Certainly the implementor and often the Lead Programmer (a.k.a. Software Manager) should
participate in the review, but also a selection of other programmers. Seldom does any piece of
software require the entire programming team. It depends on the scope and importance of the
unit being created.
The Software Manager should attempt to get enough skill and experience coverage to recognize
and discover software defects. He also is trying to schedule junior programmers as reviewers so
they learn during the process. Senior and skilled individuals are encouraged to expand their
knowledge and to pick up new methods and to teach.
The designer or author of the requirements is an excellent participant in the earliest stages of
review as the initial implementation often ferrets out design flaws or deficiencies. It is not
uncommon that a design is rejected and sent back for refinement in the first review.
One of our underlying goals in scheduling review participants was to provide insurance, not just
in software quality, but in the case where a programmer leaves the team. Hence, we often
assigned a software unit to a primary implementor and a second programmer. The primary was
responsible for the software and for partitioning the unit so two people could write components
of the same unit. Thus with constant review, there is no piece of software that only one person
understands. If not a second programmer, then we would assign a primary reviewer who would
participate in the unit’s development until its completion. Thus the second programmer or the
primary reviewer could pick up where the dearly departed programmer left off and with little
delay.
Schedule the code review meeting for one or two days after handing out the source code to
each reviewer. Make sure each reviewer has a recent copy of the design requirements of the
software unit being reviewed. They will need ample time to comprehend and inspect the work.
And, as always, start the meeting on time.
Our meetings were always driven by one of the reviewers, the primary reviewer, and not the
implementor. The primary reviewer would then be tasked with keeping the review focused,
making sure that all issues were sufficiently covered, that each reviewer and the implementor
had opportunity to state opinions, facts, observations, what-have-you. The primary reviewer
was charged with the authority to resolve any disputes, should there be any.
The meeting would start with the primary reviewer asking the implementor if there were
anything they wanted to point out or highlight about the unit under review. Then there was
usually a request for general comments by any of the reviewers. Often times there would be a
change that would need to be made throughout the unit, so it would get dealt with up front and
only one time.
Eventually, the review participants would go through the software, page by page, routine by
routine, and note any deficiencies or defects or concerns. The implementor would, on a clean
copy of the source code, mark down any changes or comments in red. This was the master
copy of the reviewed unit which served as the guide for any repairs.
Often times only the first pass required a routine by routine inspection. In later iterations, the
master copy of the source code would be used as a guide to check off and verify that all
changes were successfully completed by the implementor. Verifying (and just recalling) the
changes is extremely important. For each meeting, a new master copy listing required changes
would be generated.
We seldom took meeting minutes and let the master copy serve that purpose. Be sure to date
the master copy and note who the reviewers were. We kept the master copies in software
folders (an Uncle Sam requirement) and used them as reference. All the reviewer’s copies were
shredded, but the recycle bin is probably just as good. Disposing of out-of-date material is
simple but extremely important house cleaning.
If there were action items as a result of a review, either the implementor or the primary
reviewer would forward them to the Software Manager. These might include changes needed to
the Design Document, changes needed to other software units, or just about anything.
The meeting would typically conclude with the implementor suggesting how long it would take
to make changes, whether new components would be considered in the next review, and
whether any further reviews were necessary at all.
During my soapbox moments, I have been asked if junior programmers should review code of a
senior programmer. Yes. And the opposite? Yes. Should managers be involved? Yes. Besides
being focused on bug hunting, the review process is also a concentrated learning and teaching
session which has longer lasting affects than a seminar or classroom course. Managers and
senior people need training and re-enforcement as well, so open the box and expose the
software. Egos will fade, modesty will reign, and everyone will grow and benefit.
Avoid having the same people review the same person. Mix it up, use a round robin process,
and crossbreed the skill sets.
As time goes by, the strengths and weaknesses of the reviewer (and the programmer) will
become apparent. Some reviewers are good at picking on identifier names, others on initial
case problems, missing tests cases, missing or misspelled words in comment blocks, what-
have-you. Recognize these talents and blind spots and employ them or address them by crafty
scheduling.
So how picky do reviews get? You might have noticed I mentioned “identifier names” and
misspelled comments. That sounds scary. This sounds like a lot of work. How could this possibly
save time? Egads!
That’s where it becomes quite obvious that each development team or company must decide
what is important to scrutinize in the review. It is utterly necessary that each programming
team grow and apply their own standards. If you don’t care about function names, don’t. If you
don’t care about copyright headers in source files, then don’t. If you do care about a “default”
clause for every “switch” statement, then make that part of your coding standard. The point is
that as a group, you define your own expectations, make them clear, document them, and then
live up to them. As a group you can change and alter your level of quality and how much you
can ask of your team. Tailor the expectations to the talent you have and conscientiously decide
what your approval criteria will consist of.
My recommendation (and druthers) is that everything in the source code is open to scrutiny. If
you can’t justify a 3,000 line function, its name, the “goto” you used, then those problems need
to be fixed. If your reviewers can’t understand an algorithm, that’s a good indicator a different
approach is needed, even if it seems to work.
One of our touchstone gauges was “Will a maintenance programmer, 5 years from now,
someone you will never meet, be able to fix or enhance this module based on the source code
and design documents?” If not, the source code is “broken” and needs to be inked in red and
reviewed again during the next iteration.
Another utterly important aspect of code review is the responsibility reviewers have toward the
process. It requires commitment but it also requires flexibility. The intent is to prevent defect
creation and to recognize them, not to set up road blocks.
Another adage we followed was “If you can’t offer a better alternative, then don’t offer one.” In
other words, be selective in what you can’t live with -- just because you don’t like the approach
used doesn’t mean its not valid. Though you are responsible as a reviewer to note problems, if
the software functions sufficiently, that’s good enough. Don’t waste time making every piece of
software perfect. We’re writing games, not landing airplanes.
Here’s another healthy realization: Keep in mind that with five programmers, for example,
there will be five or more excellent solutions to any given problem, none of them alike. In this
case, 80% of the software in your game will not be written the way you would do it. Lose 4 out
of 5 arguments and have faith in your fellow programmer’s ability to meet the design
requirements. The code review is meant to guarentee successful completion of the coding task,
not to define or write the code, that’s the responsibility (and authority) of the implementor.
With time, every group I’ve been involved with develops their own methods and standards of
quality. This happens quickly. Each group also eventually tunes itself. After reviewing
Programmer B’s work for 6 months, you’ll understand their coding method quite naturally and
your comprehension of their work will increase and with reduced effort.
After more time, a group tends to narrow in on a “team method” or style of implementation. In
other words, a solution written by one programmer will use the same style and structure any
other programmer of the team would have used -- even down to using the same identifier
names. It is a bit eerie when the team consistently uses the same names for similar purposes in
different software units, but that is a mark of success when members of the collective use
similar schemes. The defect count will drop dramatically.
Who owns the code? Is it the person who wrote the first rendition? Did the second programmer
not write part of the unit? Did the reviewers not approve the implementation? Could the
primary reviewer complete a module? Is there any reason a unit couldn’t be started by one
individual and completed by another? Don’t we re-use software anyway?
As I have noted over several projects, the code review process intentionally blurs the code
ownership issue. In the best case, we had a team that couldn’t easily identify who had written
which lines of code just by looking at them. (Is that true in your team?) It could have been any
member of the team. That is a strength and mark of success. This particular team had refined
its techniques to incorporate all the skills of the team as a whole and replaced variety and
confusion to an easily comprehended, consistent style.
I guess I would redefine code ownership this way: every piece of code is owned and
comprehended by several people. Beyond that, who cares?
Ah, but there is a good reason to note who the owners are and to note that the “owner”
changes over time. The initial implementor is the person most likely to fix a problem the
fastest. That’s kind of important to track. So the initiator does tend to “own” the software for a
long time. However, they also happen to be the least likely person to recognize a defect in the
software (or they wouldn’t have put the defect there in the first place, right?).
Any of the reviewers should make excellent maintenance programmers for units they reviewed,
at least for the duration of the defect they are repairing. Hence the reviewers become owners of
the software for a time and subsequently they become the best person to fix their own repairs.
Another common practice is for junior programmers to extend the functionality of an existing
unit. So they also become owners, too, as do the reviewers of the enhancement work.
See how the ownership gets rather blurred? There are many owners over time. So that also
means that the responsibility for successful, continued functioning of a software unit gets
shared over time. Not one person is solely responsible for any one piece of code over the life of
that unit.
Thus when a single programmer departs for another competing game company, there will
always be another “owner” capable of insuring the “abandoned” software remains current and
valid for the project. Nothing ever has to be thrown away because the software has been
reviewed, approved, and maintained up to the established project standards. The programmer
departed, not the software.
In the case of a meta-design, this is the description of what a design should look like. It details
what is sufficient in a design document such that a programmer can build the desired piece of
software. A meta-design can be a set of boilerplates.
A meta-meeting is the meeting to decide how meetings are run. Will there be a meeting notice?
Who needs to attend? Is there a time-limit? Is an agenda required? Is a note taker part of
every meeting? Who steers the meetings, ends arguments, and how? What about publishing
meeting minutes?
The point here is that it is essential to have a meta-review. In other words, do not under-
estimate the importance of deciding exactly how your group wants to perform reviews, the
entire process, over the entire project. You must train and educate everyone in the process. No
one can be exempt. Review the review process and refine it.
This also includes defining exactly how nit-picky reviews need to be on a per project basis. This
of course means that a group must define the level of software quality they wish to attain. It
takes a concerted effort to establish the ground rules. Be sure to include a method to refine the
rules.
If the team employs idioms consistently, the form becomes recognizable without effort. It also
reduces the maintenance issues considerably. Avoid doing surprising things in your software
and use copious comments if you have to deviate from the norm. Use of idioms can reduce the
comment overhead.
And then, secondly, there’s consistency. Having been employed to develop coding standards
and style guides, I can say that consistency is the overall rule of thumb to apply throughout any
software project. If you spell a word incorrectly (like “Macintosh”), spell it wrong everywhere. If
you employ a flawed idiom, employ it consistently or fix it everywhere. If you use an open
brace “{“ with every “if” statement, your teammates should be doing the same thing as should
your maintenance programmers.
So when it comes to the code review process, be consistent in applying approval criteria. There
should be no mysteries and meeting the criteria should be easy. Be consistent in applying
standards, the same standards, to every piece of software and to each individual. Make no
exemptions, they’re hard to remember and seldom get documented.
Because code (and design) review is one of my favorite soapbox subjects, I have encountered
people with various amounts of exposure to the review process and to the benefits reaped from
it. But regardless of their exposure, one question that invariably gets asked is: “How much time
does it take?”
That’s not a bad question, its just not the right question. But let me answer it anyway.
For a first review on 2,000 lines of code, with say three reviewers, each reviewer may spend an
hour, maybe two depending on the topic and the quality of the accompanying design
documentation. The first review meeting might last 30 minutes, maybe longer, most likely
shorter as your team gets practiced and develops a team programming style. Successive
reviews might take 10 minutes on 2,500 lines of code for each reviewer and perhaps 5 minutes
in a meeting. Again, with practice, a team gets really quick at this task. And the task gets
quicker as fewer defects are introduced by the programmer as they invariably produce cleaner
initial code as a result of the review process.
Note how small these numbers are small compared to the amount of time it takes to reproduce
a defect, the time it takes to fix it, and time it takes to verify that it has been successfully
removed without introducing yet more defects. I mean, how much time do you spend looking
for just one bug?
Its worth pointing out that it wouldn’t be surprising to find 30 potential defects in the initial
review of 500 lines of code. That kind of defect count in a newly instituted review process
wouldn’t surprise me at all. Wrong types, out-of-range numbers, off-by-one errors, divide by
zero cases, missing “else” clauses, double-equals “==” typos, mismatched actual and formal
parameters, unreleased memory blocks, array and pointer arithmetic errors are examples we’re
all familiar with. But until you have someone else look at “your” code, you’ll never know just
how many defects you make. It is a universal eye-opener!
And then there are the higher level of defects a reviewer must be concerned with. Get just one
of these defects in the initial review and the time-savings for all subsequent reviews on this
software unit is paid for many times over -- not to mention all the debugging effort that is
avoided.
No one ever asks this question when the code review soapbox comes out. “How much time
does it take?” Who cares? “How much time does it save?” That’s what matters!!!
Well, how many bugs did you make last week? A better question is how many bugs did you
hunt down last week? How long did it take you to fix them? How long did it take for you to re-
test the software? How long did it take for an artist to convince you there was a problem in the
first place?
If sitting in a 15 minute review would have prevented all of last week’s bugs, would you rather
have had the meeting? How much time can you save your team by simple, straight-forward
inspection? Again, its just eye-opening. The time saved by doing code review is far greater than
the effort it takes to hunt down and repair bugs -- even for dull bank statement software.
Ah, still, how much time does it save? Well, you never have to throw a piece of software away
because a programmer quit. Is that enough time saved?
How re-usable is software that has been reviewed by several people? Far more re-usable than
software written in a closest by an AI programmer who didn’t understand “real-time
constraints”. How re-usable is the programming team? That’s really a better gauge on how re-
usable a piece of software is -- is the programmer who wrote the software still in the house?
Five hands should go up! Each one an owner.
One last set of good questions posed to me at this particular Sputnik gathering included: In
using code reviews and style standards and idioms, how does a new team member get
assimilated into the team? Do they have to assimilate, like become part of the Borg? Do new
team members reject the process or find it creatively restrictive?
In a lot of cases, I found that new team members welcomed an environment of constant peer
review, it was refreshing and invigorating. The involvement and caring that a review process
produces is really tremendous. But the team does have to train and educate any new team
member. Assign them a mentor.
Do new team members have to become part of the collective? Absolutely. A team cannot have
any outcasts or the process gets skewed. It must be applied evenly. And yes, some people do
not assimilate. Let them go.
Is the review process restrictive on the creative aspects of software development? No. The
point of the review process is to instill quality and reduce defects. Each new member will bring
new idioms and new skills into the team. The team and Software Manager must suck that out of
them and will, themselves, have to work to stay open to new ideas, fresh approaches, and
better ways. Review the review process constantly, the introduction of a new team member is
the ideal opportunity.
The creative aspects of software are not deterred but are actually enhanced by the sharing of
one’s labor with an interested audience. It’s also better to be creative in one’s solution to a
problem and avoid being creative in the implementation. Keep the implementation mundane,
simple, easy to understand and easy to maintain. If keeping the code simple attenuates your
creativity, then maybe software ain’t your gig to begin with.
Just as a note, our group at this one particular job was so successful, programmers and
managers were breaking down the doors trying to get into our group. We were setting the
standard and our code and design review processes were to thank.
I think this is an old Japanese saying, but if its not, I’m sorry. Regardless though, this simple
adage really sums up the attitude and intent of code review. For every mistake you make in
your software, you will be thanking your fellow programmers a hundred times for finding them
and for teaching you how not to make the same mistakes in the future. I mean, I definitely
prefer my mates finding my mistakes than I do my customers.
So how can code reviews be fun? Well, after a time, any programming group gets to know each
other and everyone’s little foibles. Now imagine that you get to talk about these little foibles in
a meeting every week or two. It doesn’t take long to see patterns. Some programmers make
the same mistake time and time again, no matter how hard they try not to. So it’s hard not to
start making fun of that.
I have seen groups get so close, so friendly, and so respectful of each other that on the day of
a code review, if a reviewer noted a particular defect that a programmer constantly makes,
they’d drop a cookie on the programmer’s desk early in the morning. It was always painful to
get welcomed by a box of cookies first thing in the morning, but it always made you’d laugh.
The implementor provides the snacks!
The point is, the code review process is extremely helpful. It can be tedious at times but yet it
is so important for repeatable success and the benefits are many. Keep the process light, keep
the meetings short and to the point. Avoid turning the process into drudgery. We found the
review meeting themselves to be a friendly and competitive atmosphere where folks could
finally, after years of writing software, they could finally show off their artistry to the people it
matters most -- and to those who really get the craft. It is an awesome forum.
Do good work.
References:
Demarco, Tom and Lister, Timothy, Peopleware: Productive Projects and Teams, 2nd ed.,
Dorset House Publishing, 1999.
Humphrey, Watts S., Introduction to the Personal Software Process, Addison-Wesley, 1997.
Hetzel, Bill, The Complete Guide to Software Testing, 2nd ed., John Wily & Sons, Inc., 1988.
Jacobson, Ivar et al, Object-Oriented Software Engineering: A Use Case Driven Approach,
Addison-Wesley, 1992.
John Stenersen has been into 3D graphics since before most of the industry was even born (he's not
that old, he just started young). Besides working on Air Force flight simulators, landing systems,
shareware, and 3D games, he's been into white-water rafting, flying ultra-lights, and Scandinavian
dancing. He is currently doing freelance developing through his company DarkViking Software and
can be reached at mailto:[email protected] For more, check out his website at
http://pws.prserv.net/darkviking.
Terrain rendering is a perennial hot issue in the world of game programming. Right now we're
at a particularly interesting point in the development of terrain rendering technology, because
polygon budgets have risen to the point where, in conjunction with real-time LOD meshing
algorithms taken from published academic papers, state-of-the-art game engines are able to
draw quite a bit of reasonably detailed terrain. However, the techniques which are currently in
common use must compromise either on terrain size or on close-up detail.
As part of the R&D for Soul Ride, the game I'm currently working on
(http://www.soulride.com/ ), I experimented with the published algorithms, and eventually
came up with an extension that eliminates the tradeoff between terrain size and close-up
detail. This article presents my algorithm, along with its similarities and differences from the
above-mentioned algorithms.
I'll start by reviewing the problem of terrain rendering, and describe the problem solved by [1],
[2], and [3] (see references at the end of this article). Then I'll explain the additional problem
solved by my algorithm. I'll present a detailed description of the algorithm, and discuss some
of the problems with it and some of the untapped potential. And last but not least, I'll provide
the source code to a demo that implements my algorithm, which you can use to help
understand it, evaluate its effectiveness, and incorporate directly into your own projects if you
want.
This article is not a general tutorial or review of terrain rendering. I'm going to assume some
familiarity on your part with the problem. If things aren't making much sense, you may want
to consult the excellent references listed at the end of the article.
The Problems
What do we want from a terrain renderer? We want a single continuous mesh from the
foreground all the way to the horizon, with no cracks or T-junctions. We want to view a large
area over a large range of detail levels: we want to see the bumps in front of our feet to the
mountains in the background. For the sake of discussion, let's say that we want feature size to
range from 1m up to 100000m; five orders of magnitude.
How can we do it? The brute-force approach won't work on ordinary computers circa Y2K. If
we make a 100000m x 100000m grid of 16-bit height values, and just draw them in a mesh
(Figure 1), we'll end up with two big problems.First, the triangle problem: we'll be sending up to
20 billion triangles/frame to our rendering API. Second, the memory problem: our heightfield
will consume 20 GB of data. It will be many years before hardware advances to the point
where we can just use brute-force and get good results.
Fig 1. Brute force approach to a
heightfield mesh.
There are several previously-published methods which successfully tackle the triangle problem.
The most widely used ones employ a clever family of recursive meshing algorithms [1], [2],
[3]. Using one of these algorithms, we can effectively tame our mesh, and render a seamless
terrain with a few thousand triangles, with the vertices intelligently selected on the fly from the
10 billion in the dataset.
However, we still have a memory problem, since the heightfield dataset consumes 20 GB (plus
some overhead to support the meshing algorithm).
One obvious solution is to compromise on detail by making the heightfield dimensions smaller.
1k x 1k is a good practical size for a heightfield with today's machines. A recently released
game called TreadMarks uses a 1k x 1k dataset to excellent effect [4] . Unfortunately, 1k x 1k
is still a far cry from 100k x 100k. We end up having to limit either the size of the terrain and
the view distance, or the amount of foreground detail.
The solution which I cover in this article is to use an adaptive quadtree, instead of a regular
grid, to represent the terrain height information. Using this quadtree, we can encode height
data at different resolutions in different regions in the terrain. For example, in a driving game,
you would want lots of fine detail on and around the roads, ideally showing every bump, but
you wouldn't need that much detail for the surrounding wilderness that you can't drive to; you
only need enough detail for the general shape of hills and valleys.
The quadtree can also be used for another attack on the memory problem: procedural detail.
The idea is to pre-define the shape of the terrain at a coarse level, and have the computer
automatically generate fine detail on the fly for the area immediately around the viewer.
Because of the quadtree's adaptive nature, this detail can be discarded when the viewer moves,
freeing up memory for creating procedural detail in a different region.
Separately, the use of quadtrees for adaptive representation of 2D functions, and the use of
quadtrees for recursive meshing [1], [3] are both well-known. However, [1] and [3] both use
regular grids for their underlying heightfield representation. Extending their meshing approach
to work with a true adaptive quadtree presents numerous complications, and requires some
tricky programming. Hence this article and the accompanying demo code.
Meshing
My meshing algorithm is based on [1], which has also influenced [2] and [3]. There are a few
key modifications, but much of the basic approach is the same, and I borrow a lot of the [1]
terminology.
There are two parts to meshing. I call the first part Update() and the second part Render(),
after [1]. During Update(), we'll decide which vertices to include in the output mesh. Then,
during Render() we'll generate a triangle mesh that includes those vertices. I'll start by
explaining Update() and Render() for an extremely simple heightfield: a 3x3 grid (Figure 2). To
Update() it, we'll look at each of the optional vertices and decide whether to include them in the
mesh. Following the terminology of [1], we'll say that if and only if a vertex is "enabled", then
we'll use it in the mesh.
Take as given that the center and corner vertices are enabled. So the task is to calculate the
enabled state for each of the four edge vertices, according to some LOD calculation which takes
the viewpoint and the vertex attributes into account.
Once we know which vertices are enabled, we can Render() the mesh. It's easy; we just make
a triangle fan with the center vertex as the hub, and include each enabled vertex in clockwise
order around the outside. See Figure 3 for examples.
Figure 3. Examples of LOD meshes on the 3x3 heightfield.
Disabled vertices in black.
To Update() and Render() an adaptive quadtree heightfield, we extend the above process by
starting with that same 3x3 square and recursively subdividing it. By subdividing, we can
introduce new vertices, and treat them like we treated the vertices of the original square. In
order to prevent cracks, however, we'll have to observe some rules.
First, we can subdivide any combination of the four quadrants. When we subdivide a quadrant,
we'll treat the quadrant as a sub-square, and enable its center vertex. For mesh consistency,
we will also have to enable the edge vertices of the parent square which are corners of the
quadrant (Figure 4). We'll define enabling a square to imply the enabling of its center vertex as
well as those corners.
After we're done with the Update(), we can Render() the quadtree. Rendering is actually pretty
simple; the complicated consistency stuff was taken care of in Update(). The basic strategy is
to recursively Render() any enabled sub-squares, and then render any parts of the square
which weren't covered by enabled sub-squares. (Figure 6) shows an example mesh.
In the above description, I glossed over the part about deciding whether a vertex should be
enabled. There are a few different ways to do this. All of them take into account what I'll call
the "vertex interpolation error", or vertex error for short. What this is, is the difference in
height between the correct location of a vertex, and the height of the edge in the triangle which
approximates the vertex when the vertex is disabled (Figure 7). Vertices which have a large
error should be enabled in preference to vertices which have a small error. The other key
variable that goes into the vertex enable test is the distance of the vertex from the viewpoint.
Intuitively, given two vertices with the same error, we should enable the closer one before we
enable the more distant one.
There are other factors that can be included as well. [1] for instance takes into account the
direction from the viewpoint to the vertex. The justification is based on the idea of screen-
space geometric error; intuitively the vertex errors are less visible when the view direction is
more vertical. [1] goes through the math in detail.
However, I don't think screen-space geometric error is a particularly good metric, for two
reasons. One, it ignores texture perspective and depth buffering errors -- even if a vertex does
not move in screen space because the motion is directly towards or away from the viewpoint,
the vertex's view-space z value does affect perspective-correction as well as depth-buffering.
Two, the viewpoint-straight-down case is both an easy case for terrain LOD algorithms, and not
a typical case.
In my opinion, there's no point in optimizing for an atypical easy case in an interactive system.
The performance of the more typical and difficult case, when the view axis is more horizontal
and much more terrain is visible, will determine the minimum system frame-rate and hence the
effectiveness of the algorithm.
Instead of screen-space geometric error, I advocate doing a similar test which results in 3D
view-space error proportional to view distance. It's really very similar to the screen-space-error
test, but without the issues I mention above. It involves only three quantities: an
approximation of the viewpoint-vertex distance called the L1-norm, the vertex error, and a
detail threshold constant. Here it is:
You probably recognize the L1-norm, even if you didn't know it had a fancy name. In practice,
using the L1-norm instead of the true viewpoint distance will result in slightly more subdivision
along the diagonals of the horizontal terrain plane. I've never been able to detect this effect by
eye, so I don't worry much about it. [4] and others use view-space-z rather than the L1-norm,
which is theoretically even more appropriate than true viewpoint distance. Nevertheless, the L1-
norm works like a champ for me, and [3] uses it too.
You can treat the Threshold quantity as an adjust-for-best-results slider, but it does have an
intuitive geometric interpretation. Roughly, it means: for a given view distance z, the worst
vertex error I'll tolerate is z / Threshold. You could do some view-angle computations and
relate Threshold to maximum pixel error, but I've personally never gone past the adjust-for-
best-results stage.
So that covers the vertex enabled test. But if you were paying attention earlier, you may also
have noticed that I glossed over another point, perhaps more important: during Update(), how
do we know whether to subdivide a quadrant or not? The answer is to do what I call a "box
test". The box test asks the question: given an axis-aligned 3D box enclosing a portion of
terrain (i.e. a quadtree square), and the maximum vertex error contained within that box, and
no other information about what's inside the box, is it possible that the vertex enable test would
return true? If so, then we should subdivide the box. If not, then there's no reason to
subdivide.
The beauty of it is, by doing the box test, we can potentially trim out thousands of vertices from
consideration in one fell swoop. It makes Update() completely scalable: its cost is not related
to the size of the full dataset, only to the size of the actual data that's included in the current
LOD mesh. And as a side benefit, the precomputed vertical box extent can be used during
Render() for frustum culling.
The box test is conservative, in that a square's max-error could be for a vertex on the opposite
side of the box from the viewpoint, and thus the vertex test itself would/will fail for that actual
vertex, whereas the box test might succeed. But once we subdivide, e'll go ahead and do four
more, more accurate box tests on the sub-squares, and the penalty for conservatism is fairly
small: a few extra vertex and box tests, and a couple extra vertices in the mesh.
Fortunately, given the above simple vertex test, a suitable box test is easy to formulate:
ex[x,y,z] == extent of box from the center (i.e. 1/2 the box dimensions)
Details
That covers the essentials of the algorithm. What's left is a mass of details, some of them
crucial. First of all, where is the height data actually stored? In all of the previously-published
algorithms, there is a regular grid of height values (and other bookkeeping data), on top of
which the mesh is implicitly [1] & [3] or explicitly [3] defined. The key innovation of my
algorithm is that the data is actually stored in an adaptive quadtree. This results in two major
benefits. First, storage can be allocated adaptively according to the actual dataset or the needs
of the application; e.g. less storage can be used in smoother areas or areas where the
viewpoint is not expected to travel. Second, the tree can grow or shrink dynamically according
to where the viewpoint is; procedural detail can be added to the region near the viewpoint on-
the-fly, and deleted when the viewpoint moves on.
In order to store heightfield information in a quadtree, each quadtree square must contain
height values for at least its center vertex and two of its edge vertices. All of the other vertex
heights are contained in other nearby nodes in the tree. The heights of the corner vertices, for
instance, come from the parent quadtree square. The remaining edge vertex heights are stored
in neighboring squares. In my current implementation, I actually store the center height and all
four edge heights in the quadtree square structure. This simplifies things because all the
necessary data to process a square is readily available within the square or as function
parameters. The upshot is that the height of each edge vertex is actually stored twice in the
quadtree.
Also, in my current implementation, the same quadtree used for heightfield storage is also used
for meshing. It should be possible to use two separate heightfields, one for heightfield storage
and one for meshing. The potential benefits of such an approach are discussed later.
A lot of the tricky implementation details center around the shared edge vertices between two
adjacent squares. For instance, which square is responsible for doing the vertex-enabled test
on a given edge vertex? My answer is to arbitrarily say that a square only tests its east and
south edge vertices. A square relies on its neighbors to the north and to the west to test the
corresponding edge vertices.
Another interesting question is, do we need to clear all enabled flags in the tree at the
beginning of Update(), or can we proceed directly from the state left over from the previous
frame? My answer is, work from the previous state (like [2], but unlike [1] and [4]). Which
leads to more details: we've already covered the conditions that allow us to enable a vertex or
a square, but how do we know when we can disable a vertex or a square? Remember from the
original Update() explanation, the enabling of a vertex can cause dependent vertices to also be
enabled, rippling changes through the tree. We can't just disable a vertex in the middle of one
of these dependency chains, if the vertex depends on enabled vertices. Otherwise we'd either
get cracks in the mesh, or important enabled vertices would not get rendered.
If you take a look at Figure 8, you'll notice that any given edge vertex has four adjacent sub-
squares that use the vertex as a corner. If any vertex in any of those sub-squares is enabled,
then the edge vertex must be enabled. Because the square itself will be enabled whenever a
vertex within it is enabled, one approach would be to just check all the adjacent sub-squares of
an edge vertex before disabling it. However, in my implementation, that would be costly, since
finding those adjacent sub-squares involves traversing around the tree. Instead, I maintain a
reference count for each edge vertex. The reference count records the number of adjacent sub-
squares, from 0 to 4, which are enabled. That means that every time a square is enabled or
disabled, the reference counts of its two adjacent edge vertices must be updated. Fortunately,
the value is always in the range [0,4], so we can easily squeeze a reference count into three
bits.
Figure 8. Each edge vertex has four adjacent sub-squares which
use
it as a corner. If any of those squares are enabled, then the edge
vertex must be enabled. For example, the black vertex must be
enabled if any of the four gray squares are enabled.
Thus the disable test for an edge vertex becomes straightforward: if the vertex is currently
enabled, and the associated reference count is zero, and the vertex test with the current
viewpoint returns false, then disable the edge vertex. Otherwise leave it alone. The conditions
for disabling a square are fairly straightforward: if the square is currently enabled, and it's not
the root of the tree, and none of its edge vertices are enabled, and none of its sub-squares are
enabled, and the square fails the box test for the current viewpoint, then disable it.
Memory
A very important issue with this (or any) LOD method is memory consumption. In a fully
populated quadtree, a single quadtree square is equivalent to about three vertices of an
ordinary heightfield, so it is imperative to keep the square data-structure as compact as
possible. Fortunately, the needs of the Update() and Render() algorithms do not require each
square to contain all the information about 9 vertices. Instead, this is the laundry list of
required data:
z 5 vertex heights (center, and edges verts east, north, west, south)
z 6 error values (edge verts east and south, and the 4 child squares)
z 2 sub-square-enabled reference counts (for east and south verts)
z 8 1-bit enabled flags (for each edge vertex and each child square)
z 4 child-square pointers
z 2 height values for min/max vertical extent
z 11-bit 'static' flag, to mark nodes that can't be deleted
Depending on the needs of the application, the height values can usually be squeezed
comfortably into 8 or 16 bits. The error values can use the same precision, or you can also do
some non-linear mapping voodoo to squeeze them into smaller data sizes. The reference
counts can fit into one byte along with the static flag. The enabled flags fit in one byte. The
size of the child-square pointers depends on the maximum number of nodes you anticipate. I
typically see node counts in the hundreds of thousands, so I would say 20 bits each as a
minimum. The min/max vertical values can be squeezed in various ways if desired, but 8 bits
each seems like a reasonable minimum. All told, this amounts to at least 191 bits (24 bytes)
per square assuming 8-bit height values. 16-bit height values bring the total to at least 29
bytes. A 32-byte sizeof(square) seems like a good target for a thrifty implementation. 36
bytes is what I currently live with in Soul Ride, because I haven't gotten around to trying to bit-
pack the child pointers. Another byte-saving trick I use in Soul Ride is to use a fixed-pool
allocator replacement for quadquare::new() and delete(). You can eliminate whatever
overhead the C++ library imposes (at least 4 bytes I would expect) in favor of a single
allocated bit per square.
There are various compression schemes and tricks that could be used to squeeze the data even
smaller, at the expense of complexity and performance degradation. In any case, 36 bytes per
3 vertices is not entirely unrespectable. That's 12 bytes/vertex. [1] reports implementations
as small as 6 bytes per vertex. [2] only requires storage of vertex heights and "wedgie
thicknesses", so the base data could be quite tiny by comparison. [4], using a modified [2],
reports the storage of wedgie thicknesses at a fraction of the resolution of the height mesh,
giving further savings.
However, such comparisons are put in a different light when you consider that the quadtree
data structure is completely adaptive: in very smooth areas or areas where the viewer won't
ever go near, you need only store sparse data. At the same time, in areas of high importance
to the game, you can include very detailed features; for example the roadway in a driving game
can have shapely speed bumps and potholes.
Geomorphs
[2] and [3] go into some detail on "vertex morphing", or "geomorphs". Basically, geomorphing
is a technique whereby when vertices are enabled, they smoothly animate from their
interpolated position to their correct position. It looks great and eliminates unsightly popping;
see McNally's TreadMarks for a nice example.
Unfortunately, doing geomorphs requires storing yet another height value for the vertices that
must morph, which would present a real data-size problem for the adaptive quadtree algorithm
as I've implemented it. It could result in adding several bytes per square to the storage
requirements, which should not be done lightly. [3] incurs the same per-vertex storage penalty,
but [2] avoids it because it only has to store the extra height values for vertices that are
actually in the current mesh, not for every vertex in the dataset.
I have three suggestions for how to address the geomorph issue. The first alternative is to
spend the extra memory. The second alternative is to optimize the implementation, so that
really small error tolerances would be practical and geomorphs unnecessary. Moore's Law may
take care of this fairly soon without any additional software work. The third alternative is to
split the quadtree into two trees, a "storage tree" and a "mesh tree". The storage tree would
hold all the heightfield information and precomputed errors, but none of the transitory
rendering data like enabled flags, reference counts, geomorph heights, etc. The mesh tree
would hold all that stuff, along with links into the storage tree to facilitate expanding the mesh
tree and accessing the height data. The mesh tree could be relatively laissez-faire about
memory consumption, because its size would only be proportional to the amount of currently-
rendered detail. Whereas the storage tree, because it would be static, could trim some fat by
eliminating most of the child links.
The storage-tree/mesh-tree split could also, in addition to reducing total storage, increase data
locality and improve the algorithm's cache usage.
Working Code
The Soul Rider engine is closed source for the forseeable future, but I did re-implement the
essentials of this algorithm as a companion demo for this article. The demo source is freely
available for you to examine, experiment with, and modify and incorporate into your own
commercial or non-commercial projects. I only ask that if you do incorporate the demo source
into a project, please acknowledge me in the credits!
I didn't sweat the data-packing issue in the demo code. That would be a good area to
experiment with. Also, I didn't implement frustum culling of squares, but all the necessary data
is readily available.
The data included with the demo comes from USGS 7.5-minute DEMs of the Grand Canyon
(USGS). At Slingshot we have a proprietary tool that crunches the USGS data and stitches
neighboring DEMs together; I collected 36 datasets and resampled them at a lower resolution
to make the heightfield. I made the texture in a few minutes in Photoshop, by loading an 8-bits
per sample version of the heightfield as raw data, running the Emboss filter on it to create
shading, and adding some noise and tinting. The texture is just one big 1024x1024 image,
stretched over the entire terrain.
The data-loading code should be fairly self explanatory, so if you have some of your own data
you want to try, it should be easy to get it in there.
The program uses OpenGL and GLUT for 3D, window setup, and input. I developed it under
Win98 using a TNT2 card, but I tried to avoid Windows-isms so it should be easy to port to
other systems that support GLUT.
In addition to the tighter data packing I mentioned, there are a few other things in the Soul
Ride engine which aren't in the article demo. The big one is a unique-full-surface texturing
system, the details of which are beyond the scope of this article. But I will mention that good
multi-resolution texturing, especially containing lighting, is extremely beneficial for exploiting
the unique features of the quadtree terrain algorithm.
One thing I haven't yet experimented with, but looking at the demo code would be fairly easy
to hack in, is on-demand procedural detail. In my view, on-demand procedural detail looms
large in the future of computer graphics. There just doesn't seem to be any other good way to
store and model virtual worlds to the detail and extent where they really have the visual
richness of the real world. Fortunately, the problem is completely tractable, if complicated. I
think this quadtree algorithm, because of its scalability, can be helpful to other programmers
working on on-demand procedural detail.
Yet another cool extension would be demand-paging of tree subsections. It actually doesn't
seem too difficult; basically you'd flag certain quadsquares at any desired spot in the hierarchy
as being "special"; they'd contain links to a whole giant sub-tree stored on disk, with the max-
error for the sub-tree precomputed and stored in the regular tree. Whenever Update() would
try to enable a "special" square, it would actually go off and load the sub-tree and link it in
before continuing. Getting it to all stream in in the background without hitching would be a
little interesting, but I I think doable. It would result in basically an infinite paging framework.
On-demand procedural detail could exploit the same basic idea; instead of chugging the disk
drive to get pre-made data, you'd run a terrain-generation algorithm to make the data on the
fly.
And another suggestion for further work would be to identifying and eliminating performance
bottlenecks. I suspect that there's some headroom in the code for making better use of the
graphics API interface.
Acknowledgements
In addition to the authors of the papers (listed below under References) which this work is
based on, I would also like to send shout-outs to Jonathan Blow, Seumas McNally and Ben
Discoe for their various thought-provoking emails and comments, and also to the participants in
the [email protected] mailing list, where I've learned a lot of extremely interesting
stuff from other programmers about different approaches and the ins-and-outs of terrain
rendering.
References:
[1] Peter Lindstrom, David Koller, William Ribarsky, Larry F. Hodges, Nick Faust and Gregory A.
Turner. "Real-Time, Continuous Level of Detail Rendering of Height Fields". In SIGGRAPH 96
Conference Proceedings, pp. 109-118, Aug 1996.
[2] Mark Duchaineau, Murray Wolinski, David E. Sigeti, Mark C. Miller, Charles Aldrich and Mark
B. Mineev-Weinstein. "ROAMing Terrain: Real-time, Optimally Adapting Meshes." Proceedings
of the Conference on Visualization '97, pp. 81-88, Oct 1997.
[3] Stefan Röttger, Wolfgang Heidrich, Philipp Slusallek, Hans-Peter Seidel. Real-Time
Generation of Continuous Levels of Detail for Height Fields. Technical Report 13/1997,
Universität Erlangen-Nürnberg.
[5] Ben Discoe, http://www.vterrain.org/ . This web site is an excellent survey of algorithms,
implementations, tools and techniques related to terrain rendering.
2D Surface Deformation
By Max I. Fomitchev
Gamasutra
February 16, 2000
URL: http://www.gamasutra.com/features/20000216/deformation.htm
Rapid growth of CPU power opens possibilities for development of 3D games that feature
realistic environment. The increase in performance of graphics accelerators frees additional CPU
cycles that can be used for real-time physical world modeling.
Game realism can be improved by simulating deformable objects or deformable surfaces. The
list of common deformable objects and surfaces include:
User receives much richer sense of reality when the geometry of an object is modified rather
than new or animated texture is mapped on to it because correctly deformed objects will look
right from any angle and in any lighting conditions. Also deformed objects can block or reveal
other objects behind them.
The implementation of deformable surfaces discussed in this paper is intended for real-time 3D
games that simulate realistic environment. The algorithm is optimized for AMD 3DNow!
technology. Different implementations and their performance are discussed.
Consider a grid that represents a simple 2D surface as shown on figure 1. Each vertex on the
surface is connected with 6 neighbors to the north, east, southeast, south, west and northwest.
This interconnection defines local topology. The neighboring vertices interact with each other by
means of elastic forces (i.e. interconnections between vertices, depicted as solid lines on the
figures, act as coil springs). In the initial (relaxed) state vertices are evenly spaced, and vectors
SRelaxE, SRelaxSE, SRelaxS specify distances between neighbors. In the relaxed state elastic forces
between vertices are equal to zero.
Figure 1. 2D deformable surface at time t0=Dt, t1=2Dt and t2=3Dt.
When an external force (Fext) is applied to a vertex on the surface at time t0=Dt, the vertex
starts moving and displaces to a new location time t1=2Dt. This displacement produces elastic
forces between local topological neighbors. The resulted elastic forces counter the displaced
vertex motion and try to return the vertex to its original location. However according to the 3rd
Newton’s law the same forces but with opposite direction act upon neighboring vertices. So at
time t2=3Dt the neighbors get displaced, and entire cluster of vertices comes in motion. With
time more and more vertices get involved in motion, representing wave propagation across the
surface.
z local topology;
z global topology (i.e. plane, cylinder, sphere, etc.)
z vertex mass m;
z relaxation distances (SRelaxE, SRelaxSE, SRelaxS);
z elasticity model (linear, exponential, etc.);
z elasticity constant (elasticity tensor E for anisotropic surfaces) E, E>= 0;
z damping constant (damping tensor D for anisotropic surfaces) d, 0 <= d <= 1.
Greater elasticity corresponds to faster deformation and stiffer body, while low elasticity
corresponds to softer bodies.
Smaller damping represents faster relaxation or faster solidification for ductile surfaces.
This paper deals with plane isotropic surfaces combined from identical particles with mass m
with six-neighbor local topology and linear elasticity model.
The surface deformation is calculated for each vertex in the following way. First vertex
displacement relative to the east, south and southeast neighbors is calculated:
DSE = SE - S - SRelaxE
DSS = SS - S - SRelaxS
FE = Elasticity · DSE
FS = Elasticity · DSS
FTotal = FE + FS + FSE
According to the 3rd Newton’s Law the vertex under consideration contributes to the total force
acting on its neighbors:
FTotalE = FTotalE - FE
FTotalS = FTotalS - FS
Notice that north, west and northwest elastic forces are not calculated directly. The total force
for the vertex FTotal is updated automatically when north, west and northwest vertices are
processed (3rd Newton’s law).
Finally, when all vertices on the surface are processed (i.e. all internal forces evaluated) new
coordinates are calculated for each vertex.
a = (FTotal + Fext)/m
DV = a · Dt
V = V + DV
DS = V · Dt
S = S + DS
Lastly vertice velocity is adjusted in order to account for damping:
V = V· d
Effectors
Normally surface does not deform by itself, but rather deforms due to the influence of an
external object – effector. Any rigid object that collides with a deformable surface is an effector.
Wind can be modeled as a stream of infinitely small rigid particles with Brownian velocities.
Spatial (3D) effectors can be modeled as a set of infinitely small particles corresponding to the
object’s vertices.
Particle effector is show on figure 3. It is characterized by influence radius REffector and strength.
Surface vertices located further than REffector from the center of effector remain unaffected. The
effector influence results in external forces acting on in-range vertices. Typically the effector-
induced force is inversely proportional to the distance between vertex and the center of
effector:
DS = SEffector – S
When simulating more or less complete physical system one can account for surface resistance
by updating total force acting on effector:
Based on the described 2D surface model different common surfaces can be represented by
varying surface parameters:
Setting damping constant to 1 will produce infinite oscillations generated by a single effector.
Deformation pattern will never be the same. This might be a good idea for infinite water waves
modeling for seascapes.
Typically a deformable surface has some fixed vertices, i.e. vertices that can not be displaced
from their original location. Such fixed vertices correspond to surface edges or to attachment
edges / points. Seashore and curtains are examples of fixed edges and fixed attachment points
respectively.
Trees and vegetation can be modeled as a hierarchy of surfaces with cylinder / cone global
topology and stiff rubber (elasticity > 1, damping < 1) properties, where individual cylinders
correspond to branches, stems or trunks. Root and attachment edges must be marked as fixed
points. Particle effectors that simulate wind can be applied to such ‘rubber’ trees to produce
realistic grass fields or woods experience.
Algorithm implementation
The surface deformation algorithm is inherently suitable for SIMD processing and can be
implemented in several ways. But before the discussion of the implementation consider the
following numerical optimizations or "cheating opportunities" that remove redundant division
and multiplication operations:
C-code
When targeting to Direct3D the easiest way is to implement surface deformation using
D3DVECTOR structure and AoS paradigm. All vectors (forces, displacements, velocities etc.) can
be defined as arrays of D3DVECTOR. However keeping in mind SIMD-optimization and porting
to 3DNow! it is necessary to choose an alternative path and SoA paradigm.
3DNow!-code
The hand-optimized assembler code that uses 3DNow! instruction set (see AMD SDK amd3dx.h
header file) can be fairly easy produced from SoA C-implementation. Vertices are processed in
pairs, and the code for x, y and z-components is the same except for displacement on memory
operations.
Athlon CPU is less sensitive for instruction scheduling mostly due to the large (72 mops) reorder
buffer. However it still a good idea to schedule instructions to avoid possible pipeline stalls due
to latencies of both 3DNow! pipelines. In the code fragment below 3DNow! instructions are
grouped in packs of 3. Each instruction processes the same vector component (x, y or z) for all
three topological neighbors: east, south and southeast. This grouping of instructions hides most
of the latency of 3DNow! pipeline since the result of the first instruction is not referenced until 3
cycles later.
The code suffers a little from misaligned data access. Misaligned access comes from the
necessity to process east and southeast vertices. However the penalty for misaligned access is
only one cycle.
The code heavily uses load-execute operations to minimize register pressure and promote code
density. Also branch for fixed vertices was eliminated and replaced by MMX pand instruction
that resets total force to zero for fixed vertices.
For storing normal data both halves of MMX registers must be written to memory but to
different locations. pswapd instruction that is an Athlon extension 3DNow! instruction set was
used to swap upper and lower 32-bit halves of a MMX register. The pswapd instruction offers
better performance than punpckhdq MMX instruction since it doesn’t use MMX shifter unit.
The 3DNow! assembler code takes about 1.5M cycles to process 10,000 vertices on Athlon-600
MHz. The 3DNow! implementation in Listing 2 is ~40% faster than x87 code.
// Initial setup
__asm {
femms
mov ebx,this // ebx -> this
movd mm7,[ebx]this.Elasticity
punpckldq mm7,mm7 // mm7 = Elasticity
mov eax,[ebx]this.NumVertex
shl eax,2 // eax = NumVertex*sizeof(float)
mov ecx,[ebx]this.NumCols
shl ecx,2 // ecx = NumCols*sizeof(float)
mov edx,eax
shl edx,1 // edx = NumVertex*sizeof(float)*2
lea edi,[ebx]this.SX // [edi] -> SX/SY/SZ
lea esi,[ebx]this.TotalForceX // [esi] ->
TotalForceX/TotalForceY/TotalForceZ
mov i,eax
sub i,ecx
M:
mov j,ecx
sub j,8
M1:
prefetchm(edi,64)
prefetchm(esi,64)
// [edi] -> S
// [esi] -> TotalForce
// mm7 = Elasticity
// mm0 = dx/dy/dz
// Process all vertices X-coordinate
movq mm3,[edi] // mm3 = SX[k]
movq mm0,[edi + 4] // SX[k + 1]
movq mm1,[edi + ecx] // SX[k + NumCols]
movq mm2,[edi + ecx+4] // SX[k + NumCols + 1]
pfsub (mm0,mm3) // dx = SX[k + 1] - SX[k]
pfsub (mm1,mm3) // dx = SX[k + NumCols] - SX[k]
pfsub (mm2,mm3) // dx = SX[k + NumCols + 1] - SX[k]
movq mm4,SREX
movq mm5,SRSX
movq mm6,SRSEX
pfsub (mm0,mm4) // dxE -= SRelaxE.x
pfsub (mm1,mm5) // dxS -= SRelaxS.x
pfsub (mm2,mm6) // dxSE -= SRelaxSE.x
movq mm3,[esi + 4] // TotalForceX[k + 1]
pfmul (mm0,mm7) // dxE *= Elasticity
pfmul (mm1,mm7) // dxS *= Elasticity
pfmul (mm2,mm7) // dxSE *= Elasticity
pfsub (mm3,mm0) // TotalForceX[k + 1] - dxE
movq mm4,[esi + ecx] // TotalForceX[k + NumCols]
movq [esi + 4],mm3 // TotalForceX[k + 1] -= dxE
pfsub (mm4,mm1) // TotalForceX[k + NumCols] - dxS
movq [esi + ecx],mm4 // TotalForceX[k + NumCols] -=
dxS
movq mm5,[esi + ecx+4] // TotalForceX[k + NumCols +
1]
pfsub (mm5,mm2) // TotalForceX[k + NumCols + 1] -
dxSE
movq [esi + ecx+4],mm5 // TotalForceX[k + NumCols +
1] = dxSE
pfadd (mm0,esi) // TotalForceX[k] += dxE
pfadd (mm0,mm1) // TotalForceX[k] += dxS
pfadd (mm0,mm2) // TotalForceX[k] += dxSE
movq [esi],mm0 // TotalForceX[k] += dx
// Y-coordinate
add edi,eax
add esi,eax
movq mm3,[edi] // mm3 = SY[k]
movq mm0,[edi + 4] // SY[k + 1]
movq mm1,[edi + ecx] // SY[k + NumCols]
movq mm2,[edi + ecx+4] // SY[k + NumCols + 1]
pfsub (mm0,mm3) // dy = SY[k + 1] - SY[k]
pfsub (mm1,mm3) // dy = SY[k + NumCols] - SY[k]
pfsub (mm2,mm3) // dy = SY[k + NumCols + 1] - SY[k]
movq mm4,SREY
movq mm5,SRSY
movq mm6,SRSEY
pfsub (mm0,mm4) // dyE -= SRelaxE.y
pfsub (mm1,mm5) // dyS -= SRelaxS.y
pfsub (mm2,mm6) // dySE -= SRelaxSE.y
movq mm3,[esi + 4] // TotalForceY[k + 1]
pfmul (mm0,mm7) // dyE *= Elasticity
pfmul (mm1,mm7) // dyS *= Elasticity
pfmul (mm2,mm7) // dySE *= Elasticity
pfsub (mm3,mm0) // TotalForceY[k + 1] - dyE
movq mm4,[esi + ecx] // TotalForceY[k + NumCols]
movq [esi + 4],mm3 // TotalForceY[k + 1] -= dyE
pfsub (mm4,mm1) // TotalForceY[k + NumCols] - dyS
movq [esi + ecx],mm4 // TotalForceY[k + NumCols] -=
dyS
movq mm5,[esi + ecx+4] // TotalForceY[k + NumCols +
1]
pfsub (mm5,mm2) // TotalForceY[k + NumCols + 1] -
dySE
movq [esi + ecx+4],mm5 // TotalForceY[k + NumCols +
1] -= dySE
pfadd (mm0,esi) // TotalForceY[k] += dyE
pfadd (mm0,mm1) // TotalForceY[k] += dyS
pfadd (mm0,mm2) // TotalForceY[k] += dySE
movq [esi],mm0 // TotalForceY[k] += dy
// Z-coordinate
add edi,eax
add esi,eax
movq mm3,[edi] // mm3 = SZ[k]
movq mm0,[edi + 4] // SZ[k + 1]
movq mm1,[edi + ecx] // SZ[k + NumCols]
movq mm2,[edi + ecx+4] // SZ[k + NumCols + 1]
pfsub (mm0,mm3) // dz = SZ[k + 1] - SZ[k]
pfsub (mm1,mm3) // dz = SZ[k + NumCols] - SZ[k]
pfsub (mm2,mm3) // dz = SZ[k + NumCols + 1] - SZ[k]
movq mm4,SREZ
movq mm5,SRSZ
movq mm6,SRSEZ
pfsub (mm0,mm4) // dzE -= SRelaxE.z
pfsub (mm1,mm5) // dzS -= SRelaxS.z
pfsub (mm2,mm6) // dzSE -= SRelaxSE.z
movq mm3,[esi + 4] // TotalForceZ[k + 1]
pfmul (mm0,mm7) // dzE *= Elasticity
pfmul (mm1,mm7) // dzS *= Elasticity
pfmul (mm2,mm7) // dzSE *= Elasticity
pfsub (mm3,mm0) // TotalForceZ[k + 1] - dzE
movq mm4,[esi + ecx] // TotalForceZ[k + NumCols]
movq [esi + 4],mm3 // TotalForceZ[k + 1] -= dzE
pfsub (mm4,mm1) // TotalForceZ[k + NumCols] - dzS
movq [esi + ecx],mm4 // TotalForceZ[k + NumCols] -= dzS
movq mm5,[esi + ecx+4] // TotalForceZ[k + NumCols + 1]
pfsub (mm5,mm2) // TotalForceZ[k + NumCols + 1] - dzSE
movq [esi + ecx+4],mm5 // TotalForceZ[k + NumCols + 1] -= dzSE
pfadd (mm0,esi) // TotalForceZ[k] += dzE
pfadd (mm0,mm1) // TotalForceZ[k] += dzS
pfadd (mm0,mm2) // TotalForceZ[k] += dzSE
movq [esi],mm0 // TotalForceZ[k] += dz
// Update indices
sub edi,edx
sub esi,edx
add edi,8
add esi,8
sub j,8
jnz M1
add edi,8
add esi,8
sub i,ecx
jnz M
lea ecx,[ebx]this.SX
mov ptrS,ecx // ptrS = SX
movd mm5,[ebx]this.Mass
punpckldq mm5,mm5
movd mm6,dt // mm6 = dt
punpckldq mm6,mm6
pfmul (mm5,mm6) // mm5 = Massdt
movq Massdt,mm5
movd mm7,[ebx]this.Damping
punpckldq mm7,mm7 // mm7 = Damping
lea ecx,[ebx]this.Normal // [ecx] -> Normal
lea edx,[ebx]this.Fixed // [edx] -> Fixed
lea edi,[ebx]this.TotalForceX // [edi] -> TotalForce
lea esi,[ebx]this.ExternalForceX// [esi] -> ExternalForce
lea ebx,[ebx]this.VX // [ebx] -> V
mov k,eax
// For each vertex add external force
M2:
prefetchm(edi,64)
prefetchm(esi,64)
prefetchm(ebx,64)
prefetchm(ecx,64)
movq mm6,[edx] // Fixed[k]
movq mm0,[edi] // TotalForceX[k]
movq mm1,[edi + eax] // TotalForceY[k]
movq mm2,[edi + eax*2] // TotalForceZ[k]
movq mm3,[esi] // ExternalForceX[k]
movq mm4,[esi + eax] // ExternalForceY[k]
movq mm5,[esi + eax*2] // ExternalForceZ[k]
pfadd (mm0,mm3) // mm0 = TotalForceX[k] + ExternalForceX[k]
pfadd (mm1,mm4) // mm1 = TotalForceY[k] + ExternalForceY[k]
pfadd (mm2,mm5) // mm2 = TotalForceZ[k] + ExternalForceZ[k]
pxor mm3,mm3
pand mm0,mm6 // TotalForceX[k] & Fixed[k]
pand mm1,mm6 // TotalForceY[k] & Fixed[k]
pand mm2,mm6 // TotalForceZ[k] & Fixed[k]
movq mm6,Massdt
movq [esi],mm3 // ExternalForceX[k] = 0
movq [esi + eax],mm3 // ExternalForceY[k] = 0
movq [esi + eax*2],mm3 // ExternalForceZ[k] = 0
//maskmovq mm0,mm4
movq [edi],mm0 // TotalForceX[k] += ExternalForceX[k] & Fixed[k]
movq [edi + eax],mm1 // TotalForceY[k] += ExternalForceY[k] & Fixed[k]
movq [edi + eax*2],mm2 // TotalForceZ[k] += ExternalForceZ[k] & Fixed[k]
pfmul (mm0,mm6) // mm0 = TotalForceX[k]*Massdt
pfmul (mm1,mm6) // mm1 = TotalForceY[k]*Massdt
pfmul (mm2,mm6) // mm2 = TotalForceZ[k]*Massdt
movq mm6,dt
punpckldq mm6,mm6
// Update pointers
add ptrS,8
add edi,8
add esi,8
add ebx,8
add ecx,(TYPE D3DVECTOR)*2
add edx,8
sub k,8
jnz M2
femms
}
}
As a real-time 3D graphics developer, I need to wage many battles. I fight with artists over
polygon counts, with graphics card manufacturers over incomplete or incorrect drivers, and with
some producers’ tendencies to continuously expand feature lists. However, some of the greatest
battles I have fought have been with myself. I fight to bring back the knowledge I have long
since forgotten. I fight my desire to play the latest action game when more pressing needs are
at hand (deadlines, the semblance of a social life).
In this I document one of the less glamorous battles — the battle of the physics simulator. It’s
not going to be fun. It’s going to be a bit bloody. However, if I ever hope to achieve a realistic
and interesting physics simulation, it’s a battle that must be fought. So, my brave warriors, join
me. Sharpen your pencils, stock your first-aid kit with plenty of aspirin, drag out the calculus
book, and fire up the coffeepot. Let’s get started.
I hope you all had a chance to play around with the soft body dynamics simulator from my
article titled Collision Response: Bouncy, Trouncy Fun. The demo highlighted an interesting
problem — the need for stability. While creating my dynamics simulation, I waged a constant
battle for stability. However, in order to wage the war effectively, I need to understand the
roots of the instability in the system. Last month, I implied that the problem resulted from my
use of a simple Euler integrator. But I didn’t really explain why that caused the problem. Let me
fix that right now.
Many game programmers never realize that when they create the physics model for their game,
they are using differential equations. One of my first programs on the Apple II was a spaceship
flying around the screen. My "physics" loop looked like this:
Look familiar to anyone? It’s a pretty simple physics model, but it turns out that even here I
was integrating. If you look at the Euler integrator from last month, I had
Now for my simple physics model, DeltaTime = 1 and Mass = 1. Guess what? I was integrating
with Euler’s method and didn’t even know it. If I had made this Apple II physics model any
more complex, this integrator could have blown up on me. These sorts of problems can be
difficult to track down, so it’s important to understand the causes.
The reason that the Euler integrator can blow up is that it’s an approximation. I’m trying to
solve a differential equation by using an iterative numerical method. The approximation can
differ from the true value and cause error. When this error gets too large, the simulation can
fail. A concrete example may help to explain. Last month, I added a viscous drag force to the
simulation to add stability. The formula for this force was
(Eq. 1)
In this formula, kd represents the coefficient of drag that is multiplied by the velocity of the
particle. This coefficient determines how fast the velocity of the object is dragged down to zero.
This is a very simple differential equation. In fact, it’s simple enough to be satisfied for v
directly by the formula. I can use this exact solution to check the accuracy of my numerical
integrator:
(Eq. 2)
Euler’s method is used to approximate the integral curve of Equation 2 with a series of line
segments along this path. Each step along this path is taken every time, interval h, via the
formula
(Eq. 3)
In all cases, the viscous drag force should approach zero. However, the size of the step h and
coefficient of drag kd determine how well the approximation performs. Take a look at Figure 1.
With the given step size and drag coefficient, Euler’s method may not be a great approximation,
but it gives the desired result. The velocity converges on zero. But take a look at the
relationship between the step size and drag coefficient in Equation 3.
If then the approximation step will overshoot zero, as you can see in Figure 2.
Figure 2. This looks a lot worse.
By increasing the step size, I was trying to get a system that converged to zero more quickly —
but I got something entirely different. Things really start to get bad when the drag coefficient
increases more, as in Figure 3. As each step is taken, not only does the approximation oscillate
across zero, but it also actually diverges from zero, and eventually explodes the system. This is
exactly what was happening in the spring demonstration from last month, when the box blew
up.
If you find a situation where your simulator blows up, there’s an easy way to see if this kind of
numerical instability is the cause. Reduce the step size. If you reduce the size of the step and
the simulation works, then this numerical instability is the problem.
The easy solution is always to take small steps. However, realize that each step requires quite a
few calculations. The simulation will run faster if it can take fairly large step sizes.
Unfortunately, when you get lots of objects interacting, these instability problems appear even
more. So, just when things start to get interesting, you need to reduce the step size and slow
things down.
I’d rather create an integrator that would allow me to take large step sizes without sacrificing
stability. To do this, I need to look at the origins of Euler’s method.
Taylor’s Theorem
You may remember Taylor’s Theorem from calculus. It’s named after mathematician Brook
Taylor’s work in the eighteenth century. This theorem describes a method for converging on the
solution to a differential equation.
(Eq. 4)
In Equation 4, Pn(x) represents the nth Taylor polynomial. If you take the limit of Pn(x) as
you get the Taylor series for the function. If, however, the infinite series is not
calculated and the series is actually truncated, Rn(x) represents the error in the system. This
error is called the truncation error of approximation.
How does this apply to the problem with which we are working? If I only look at the first Taylor
polynomial and do some substitution, I get Equation 5.
(Eq. 5)
Notice how similar this equation is to Equation 3. In fact, Euler’s method is based on this
equation. The only difference is that the last error term is dropped in Equation 5. By stopping
the series at the second term, I get a truncation error of 2. This gives Euler’s method an error
of order .
If I added another term of the Taylor series to the equation, I could reduce the error to .
However, to compute this exactly, I would need to evaluate the next derivative of f(x). To avoid
this calculation, I can do another Taylor expansion and approximate this derivative as well.
While this approximation increases the error slightly, it preserves the error bounds of the Taylor
method. This method of expansion and substitution is known as the Runge-Kutta techniques for
solving differential equations. This first expansion beyond Euler’s method is known as the
Midpoint method or RK2 (Runge-Kutta order 2), and is given in Equation 6. It’s called the
Midpoint method because it uses the Euler approximation to move to the midpoint of the step,
and evaluates the function at that new point. It then steps back and takes the full time step
with this midpoint approximation.
(Eq. 6)
In fact, I can continue to add Taylor terms to the equation using the Runge-Kutta technique to
reduce the error further. Each expansion requires more evaluations per step, so there is a point
at which the calculations outweigh the benefit. I don’t have the space to get into it here,
however, I understand that smaller step sizes are preferred over methods above RK4 with an
error of (Faires & Burden, p. 195). Runge-Kutta order 4 is outlined in Equation 7.
(Eq. 7)
RK4 gives the simulation a very robust integrator. It should be able to handle most situations
without blowing up. The only issue now is what the step size should be.
Even with a robust integrator such as RK4, there will be times when the simulation will be in
danger of blowing up. To keep this from happening, you may have to reduce the step size at
times. At other times, however, a large step size works fine. If my simulator only has a single
fixed step size, I cannot take advantage of these facts. If I vary the size of the steps according
to need, I could use large steps when possible without sacrificing stability.
This is how it works. I take full step using my current integrator, then take two steps half the
current step size, and compare the results. If the error between the two results is greater than
a threshold, then the step size should be reduced. Conversely, if the error is less than the
threshold, the step size could actually be increased. This form of control is known as an
adaptive step size method. Adaptive methods are a major area of research in numerical
analysis, and can definitely improve simulation performance. I chose not to implement adaptive
step size controls in my simulation. However, this is an area where you could improve the
simulation.
Other Techniques
Differential equations are not easy to learn and understand. However, the programmer who
pursues this knowledge has many weapons in his arsenal. As witnessed by the birthdates of
Euler and Taylor, this research has been going on for centuries. If you ignore this work and
strike out on your own, you’re doing yourself a great disservice. Knowledge is available to the
developer as never before. While working on these algorithms, I was able to cross-check
formulas and techniques in many different sources.
In fact, I’ve barely scratched the surface of the field. The integrators I’ve described (all explicit
one-step methods) represent only a subset of the methods available to the programmer.
Implicit integrators will also work. For example, an implicit Runge-Kutta integrator trades
greater computations per step for greater stability in particularly difficult differential equations.
Also, the one-step nature of these integrators reflects the fact that the method does not
consider any trends in the past when calculating a new value.
In addition to these one-step methods, there are also multistep methods, extrapolation
algorithms, predictor-corrector methods, and certainly many others. Clearly, there is plenty of
ground for the adventurous programmer to explore. The book I used, Numerical Algorithms
with C, does a good job of comparing different methods during a variety of test conditions.
For this month’s sample application (available from Game Developer’s web site), I have
implemented both the midpoint method and Runge-Kutta order 4 in the dynamic simulation
from last month. You can switch between integrators and adjust the step size and simulation
variables to get a feel for how each performs.
In addition to the references cited last month, a couple of other sources proved very valuable
during this article.
• Faires, J. Douglas and Richard Burden. Numerical Methods. Second edition. Pacific Grove,
California: Brooks/Cole, 1998. This book provided a great discussion of measuring error in
numerical solutions. It also contains a great deal of source code for all the algorithms.
• Engeln-Müllges, Gisela and Frank Uhlig. Numerical Algorithms with C. , New York, New York:
Springer-Verlag, 1996. In addition to the fine sections on the methods discussed in this column,
this book describes and compares a great number of other numerical methods. Additionally, the
book has a great number of references to articles on the topic.
While not as strong a reference on these topics, this book may be interesting to many, as it is
available in electronic form. See http://www.nr.com but also check out a critical discussion of it
on http://math.jpl.nasa.gov/nr/nr.html
Jeff Lander is the technical director of Darwin 3D where he spends time calculating
his rate of procrastination with respect to his articles. E-mail optimization
suggestions to mailito:"[email protected]".
I was all set to start talking about how to handle collision response. I thought I could just have
these objects that you could move around, make collide, and then watch their responses. Yeah,
collision response, that will be great! Then I thought, "How am I going to get these objects
flying around in the first place?" Well, I could give each object an initial velocity and they would
collide. But, I would need world boundaries for those objects to bounce off of so they would
stay in play. To direct the objects, I need to be able to apply force. Suddenly, instead of a nice
collision demo, I had designed Asteroids. All I wanted was a little demonstration of a fairly
simple concept and instead I ended up applying forces and acceleration to particles. I had
stumbled on the big "D" word: Dynamics.
That’s alright. I will not be afraid. I always say, "Turn a problem into an opportunity." However,
physics is a huge field full of fertile topics that can be distilled into nice column-sized pieces. So
once more good friends, into the breach.
When I was writing about inverse kinematics back in September, I was only really interested in
kinematics: that is, the study of motion without regard to the forces that cause it. Dynamics, I
said, concerns how forces are used to create motion, and I didn’t want to open up that can of
worms. Well, the can is now open and the worms are climbing all over.
I’m going to have to recap a bit, but I suggest you go back and reread Chris Hecker’s column
from the October/November 1996 Game Developer, "Physics, the Next Frontier." If you don’t
have the magazine handy, that article as well as other physics related articles written by Chris
Hecker are available on the Definition Six web site at http://www.d6.com/users/checker.
This month, I’m going to focus on particle dynamics. What is particularly important about
particle dynamics is the relationship between force, f, the mass of a particle, m, and the
acceleration of that particle, a. This can be stated in the familiar Newtonian notation as f = ma.
You may recall that the acceleration of a particle is the derivative with respect to time of the
velocity of that particle, v. Likewise, the velocity of the particle is the derivative with respect to
time of the position of the particle, x. You can see how this relationship works in the following
equation:
So, let me state the problem I’m trying to solve. Given a set of forces acting on a particle at
time t, where will that particle be after a small amount of time has passed? It’s clear that with
the value of the force and the mass of the particle, I can obtain the acceleration of the particle.
If I integrate that acceleration with respect to t, I’ll end up with the new velocity of the particle.
If I integrate again, I get the new position. Easy, right?
The structure for a particle is in Listing 1. It’s easier to store 1/m for the particle because this is
what I need in the equations. The forces that act on the particle accumulate in the f term. With
this information, I can integrate the dynamic system forward in time to establish a new position
for the particle. This process involves solving ordinary differential equations. Fortunately,
Chris’s column described a numerical method of solving these problems. Listing 2 contains code
that uses the simplest numerical integrator, known as Euler’s method, to compute the new
state of the system. The great thing about this integrator is that it’s simple to implement and
understand. However, because it’s a simple approximation, it’s subject to numerical instability,
as we will see later.
I now have a method for dynamically moving particles around in a realistic fashion. However, to
get anything interesting to happen, I need to get things moving. This requires the application of
some brute force, or several forces. But what kinds of forces do I want to apply to my little
particles?
Well, the obvious force that has been applied to objects in games since the beginning of
computer simulations is gravity. Gravity is a constant force that is being applied to all particles.
In order to realistically simulate gravity, force must be added into the particle’s force
accumulator every system update. In general, this force is a vector pointing down along the y
axis. However, there’s nothing to stop a simulator from having a gravity vector that points in a
different direction. In fact, one of the very cool things about having a good physical simulation
is that gravity can change and things will still "look" correct. This realistic look may not occur if
you are trying to hand animate an object.
Now, gravity was a pretty obvious force to apply to particles. But what else can I do? A loose
connection of points isn’t really all that interesting to watch even if it is simulated with accurate
physics. It would be much more entertaining if I could connect those particles to form
structures.
What about stretching a spring between two particles? This procedure is actually easy to
implement. Hook’s spring law is a pretty good way of representing the forces that a spring
exerts on two points:
This formula represents the force applied to particles a and b; the distance between these
particles, L; the rest length of the spring, r; the spring constant or "stiffness", ks; the damping
constant, kd; and the velocity of the particles, v. The damping term in the equation is needed in
order to simulate the natural damping that would occur due to the forces of friction. This force,
called viscous damping, is the friction force exerted on a system that is directly proportional
and opposite to the velocity of the moving mass. In practice, the damping term lends stability
to the action of the spring. The code applying the spring force on two particles is in Listing 3.
Other Forces
Viscous drag should be applied to the entire system. A drag is a great way of making the
particles look as though they are floating around in oil. It also adds numerical stability to the
system, meaning that the particles won’t bounce around too much. A viscous drag force is
applied by multiplying a damping constant, Kd, with the velocity of the particle and subtracting
that force from the accumulator.
Momentary forces are also very useful for interacting with the simulation. I’ve used a spring
tied to a particle and attached the mouse to drag the object around. A force applied to a
particle can be used to create a motor or other source of motion.
You can also make some interesting effects by locking a particle. That is, by turning off the
simulation for a particular particle, it becomes fixed and can act as an anchor point. (You can
achieve the same effect by causing the particle to have an infinite mass. In the simulator,
simply set the particle’s mass to zero.) Immobilizing one particle like this creates many
possibilities for creating complex simulations.
Whew, now that I have a nice dynamic particle simulator, I can start talking about collision
detection and response again. The simplest form of collision detection that I can add to this
simulation is point-to-plane collision. With particles, it will be easy. Last month, I discussed the
use of the dot product to determine whether a point has collided with a plane. Take a look at
Figure 1.
A particle in contact with a boundary may not be colliding with Figure 1. A particle
that boundary if the particle is moving away from the boundary. colliding with a plane.
The relative velocity of the two bodies is checked by calculating
N • V. If that value is less than zero, the two bodies are in colliding contact and I need to
resolve the collision.
Now that I have this nifty particle simulator where I can attach
particles with springs and apply forces to them, it’s time to build
something. Let me start with a simple block such as the one in
Figure 3.Each of the edges of the object is a spring connecting
the vertices. Unfortunately, if I run this object through the
simulator, I end up with a big heaping mess. The mess occurs
because the springs connecting the vertices aren’t enough to
provide stability for the cube. In order to create a cube that
won’t collapse, it’s necessary to put crossbeam supports on each
face of the cube as in figure 4.
Figure 3. A simple
Creating objects this way feels
dynamic cube.
more likeconstructing a bridge
than 3D modeling. You find yourself adding struts and
crossbeams all over the place. Leave a face open and it behaves
correctly. The face without the crossbeam supports is more likely
to collapse.
I really find in fun to play with this simulator. It’s very satisfying to bring in shapes and play
with making them stable and tweaking the spring and gravity settings. You then can fling the
objects all around and bounce them off the walls. There are many more variables that can be
added to the simulator. Other forces such as contact friction can be added. Some interactive
features such as pinning vertices would make it more fun. But I think we’re on our way to a
really fantastic Jello-land simulator. Check out the source code and demo application on the
Game Developer web site. It will allow you to load in your own shapes, connect them with
springs, and play around with the simulator.
• Hecker, Chris. "Behind the Screen." Game Developer, October 1996 – June 1997. Credit for
the ideas and some of the methods of simulation go to Chris Hecker. I have tried to base my
code on many of his ideas so it will be familiar to readers. His excellent series of articles on rigid
body physics got me and many others excited about real-time physics. Hopefully, I can
continue to build on this tradition. Also available on Chris’s web site at
http://www.d6.com/users/checkers.
You will need several good math and physics books if you really want to get into this topic. Here
are a few that I used in this article.
• Beer and Johnston. Vector Mechanics for Engineers: Dynamics, Sixth Edition, WCB/McGraw-
Hill, New York, 1997.
• Mullges and Uhlig. Numerical Algorithms with C, Springer-Verlag, New York, 1996
• Acton, Forman S. Numerical Methods that Work, Harper and Row, New York, 1970. This last
book was a useful little book my father had from his days of working on guidance systems. Now
I am using it to make virtua-jello. Go figure.
• Doug DeCarlo at the University of Pennsylvania wrote an application for X-Windows called
XSpringies that allows you to simulate 2D particle-spring interactions. You can check this out
from his website at http://www.cis.upenn.edu/~dmd/doug.html or get the program at
ftp://cis.upenn.edu/pub/dmd/xspringies/xspringies-1.12.tar.Z.
Many have told Jeff that his top is made of the rubber and bottom of the spring.
Bounce him and Darwin 3D a note at
http://www.gamasutra.com/features/20000208/[email protected].
///////////////////////////////////////////////////////////////////////////////
// Function: Integrate
// Purpose: Calculate new Positions and Velocities given a deltatime
// Arguments: DeltaTime that has passed since last iteration
// Notes: This integrator uses Euler's method
///////////////////////////////////////////////////////////////////////////////
void CPhysEnv::Integrate( float DeltaTime)
{
/// Local Variables //////////////////////////////////////////////////////////
int loop;
tParticle *source,*target;
///////////////////////////////////////////////////////////////////////////////
source = m_CurrentSys; // CURRENT STATE OF PARTICLE
target = m_TargetSys; // WHERE I AM GOING TO STORE THE NEW STATE
for (loop = 0; loop < m_ParticleCnt; loop++)
{
// DETERMINE THE NEW VELOCITY FOR THE PARTICLE
target->v.x = source->v.x + (DeltaTime * source->f.x * source->oneOverM);
target->v.y = source->v.y + (DeltaTime * source->f.y * source->oneOverM);
target->v.z = source->v.z + (DeltaTime * source->f.z * source->oneOverM);
source++;
target++;
}
}
p1 = &system[spring->p1];
p2 = &system[spring->p2];
VectorDifference(&p1->pos,&p2->pos,&deltaP); // Vector distance
dist = VectorLength(&deltaP); // Magnitude of deltaP
Publishing game titles for multiple platforms is no new idea, but it is usually the case that the
title is developed for a single operating system and the source code is subsequently ported to
additional platforms. Ports can be costly, time-consuming, and downright ugly, however, if the
original code is heavily dependent on interfaces or data types which are not present on the
platform targeted by the port. The only alternative is to develop for multiple platforms
simultaneously from the very beginning of a project. Simultaneous development requires that a
system architect have some expertise in each of the platforms for which a title is targeted for
release. It is the focus of this article to demonstrate how major subsystems of a game engine
can be written to function on both Windows and Macintosh operating systems.
When programming for multiple platforms, it is usually desirable to hide code that is dependent
on a particular operating system by using a layered design. Platform-specific code should be
safely tucked away at the lowest level possible, and code inhabiting the higher levels should
require no knowledge of how the lower levels are implemented. This black box approach should
be used to encapsulate major subsystems of a game engine as well as any miscellaneous
functions and data types which may be depend on the underlying operating system. Once these
lowest layers have been written, an applications programmer should be able to write code that
interacts only with the interface that the game engine itself exposes to higher layers. The game
engine is then responsible for communicating the right information to and from the operating
system.
There appear to be two popular ways of organizing platform-dependent code. The first is to
have separate Windows and Macintosh versions of a file that implements a particular
subsystem. Ports invariably lead to this arrangement. This approach, however, usually leads to
duplication of a lot of code which is actually common to both platforms. It also carries the
drawback of having to touch multiple files to make a small change which requires that each
implementation be altered. The examples in this article shall use a second approach: conditional
compilation. We can define two flags, named WINDOWS and MACOS, which when used with #if …
#elif … #endif blocks tell the preprocessor to either keep a block of code, if the corresponding
flag is defined to be 1, or remove a block of code, if the flag is defined to be 0.
Consider the following simple data type abstraction example which uses the above scheme.
#if WINDOWS
#elif MACOS
typedef WindowPtr WindowReference;
#endif
This creates a small layer between the operating system and higher levels of the game engine
which hides the platform-dependence of the WindowReference type. Any functions or derivative
data types that need to refer to a window should use the WindowReference name instead of the
name defined by the operating system’s API. This way, anything that refers to a window does
not require separate definitions for each platform since it does not directly depend on a
platform-specific data type.
This practice of isolating platform dependencies at the lowest possible level will be apparent
throughout the code accompanying this article. The remainder of this article will be spent
building abstraction layers for 3D graphics, sound, and networking subsystems upon which
higher levels can rest without knowledge of what operating system they are running on. The
differences between the two platforms will be pointed out, and methods for avoiding common
pitfalls will be presented.
3D Graphics
When developing a 3D engine for both Windows and Macintosh, one has little choice but to use
OpenGL, primarily because it is the only 3D API supported on the Macintosh platform. OpenGL
carries the tremendous advantage that it is a cross-platform standard. Almost all of a game’s
3D code can be written once and be guaranteed to work on both platforms. The only piece that
is dependent on the operating system is the initial context setup. Both the Windows and
Macintosh operating systems supply a small API to link OpenGL to their internal windowing
systems. On Windows, it is called WGL, and on the Macintosh, it is known as AGL. These
libraries enable an application to query the pixel formats that the 3D hardware supports and to
bind an OpenGL context to a window.
Listing 1 shows the Windows and Macintosh versions of a class named GraphicsContext which
encapsulates an OpenGL context. The constructor of this class creates an OpenGL context and
associates it with a window. The first step towards accomplishing this is to tell the operating
system what pixel format and context attributes are desired. Both WGL and AGL supply a
function which takes this information and returns the most appropriate pixel format identifier.
With this identifier in hand, an OpenGL context can be created and bound to a window. It is
important to note that on both platforms, each thread of execution may have only one active
OpenGL context at any time, as specified with the wglMakeCurrent and aglSetCurrentContext
functions.
The SetPixelFormat function on the Windows platform carries with it the limitation that it may
be called only once for any given device context. This becomes problematic if you would like to
change resolutions or pixel depths and thus need to create a new OpenGL context for a window
which previously had one bound to it. Destroying and recreating the main window can just
make the problem worse since other game systems which are also associated with that window
would have to be restarted as well. A nice workaround is to create a full-size child window of
the main window and use it as the window to which the OpenGL context is attached. This way,
only the child window needs to be destroyed and recreated allowing OpenGL context
replacement without disturbing other systems.
Once a context has been created and rendering has occurred, the image is displayed by calling
a single WGL or AGL function which swaps the OpenGL drawing buffers. This is demonstrated in
the UpdateContext function shown in Listing 1.
Accessing OpenGL extensions is the only remaining area in which Windows and Macintosh
methods differ. Once you have determined that an extension is available (by calling
glGetString with the GL_EXTENSIONS selector), Windows implementations must call
wglGetProcAddress to retrieve a pointer to any function defined by the extension. Extensions
currently supported on the Macintosh already have entry points for their functions defined in
the OpenGL library, so there is no need to locate the function and call it through a pointer. As
an example, suppose that the GL_ARB_multitexture extension is available and you want to call
the function glActiveTextureARB. On the Macintosh, this function is already defined and may
be called directly. On Windows, a pointer to this function can be retrieved by using the following
code.
ATProc activeTextureProc =
(ATProc) wglGetProcAddress("glActiveTextureARB");
Once this function pointer has been acquired, a small inline function can be defined to emulate
a direct function call as follows.
(*activeTextureProc)(target);
Sound
Sound programming differs significantly between the Windows and Macintosh operating
systems. On Windows, DirectSound provides the functionality for playing basic sounds and for
implementing more advanced effects such as 3D positional audio. On the Macintosh, one uses
the Sound Manager to handle their audio needs. The Macintosh Sound Manager by itself does
not provide any 3D support, however. Instead, 3D sound is provided by the supplemental use
of the SoundSprocket library. Listing 2 shows how to play a buffer of 16-bit, 22.050 kHz, stereo
sound on both Windows and Macintosh. Adding 3D effects to this code is beyond the scope of
this article, but is not difficult once the level of functionality presented here has been
implemented.
The SoundMgr class shown in Listing 2 demonstrates the initialization of DirectSound. The
Macintosh Sound Manager requires no initialization, and thus there is no Macintosh counterpart
to this class. DirectSound requires that you create a primary play buffer and specify what sound
format the hardware should be prepared to play. Sounds are actually played by allocating
secondary play buffers which are then mixed into the primary buffer by DirectSound. This is
demonstrated in the Sound class shown in Listing 2. The constructor for the Sound class creates
a secondary sound buffer and copies the sound information into this buffer (which may actually
be on the sound hardware). The sound is then played by calling the Play function. Sounds may
be looped by specifying the DSBPLAY_LOOPING flag as the last parameter to the
IDirectSoundBuffer::Play function.
Sound is played on the Macintosh by allocating sound channels and sending commands to
them. The creation of a sound channel is demonstrated in the Sound constructor shown in
Listing 2. Once a sound channel exists, a sound buffer can be played through it by issuing a
bufferCmd command. This command carries with it a pointer to an extended sound header,
which was filled out by the Sound constructor. This header contains all of the format information
necessary for the Sound Manager to correctly play the sound.
Playing looping sounds on the Macintosh is not as simple as on Windows. If you want a sound to
play a finite number of times, you can simply issue several bufferCmd’s to the sound channel.
(Sound commands are queued and executed only after any previous command has completed.)
If you want the sound to loop indefinitely, however, you will have to issue a callBackCmd
command which will notify you when a sound has finished playing. When a sound channel is
created, an optional callback function may be specified, and this function is invoked whenever
the sound channel encounters a callBackCmd. A callback function on the Macintosh has to be
specified as a universal procedure pointer, as done in the Sound constructor. (Universal
procedure pointers are function pointer abstraction mechanisms left over from the 680x0 to
PowerPC transition.) The callback function itself simply issues another bufferCmd and
callBackCmd which continue the looping process.
An important issue to keep in mind when playing 16-bit sound is byte order. If you are playing
sound from a *.WAV file for instance, you will have to reverse the two bytes in each audio
sample before playing them on the Macintosh since these samples are stored in little endian
byte order. A single audio sample can have its byte order reversed with the following code.
This code uses the handy PowerPC instruction lhbrx (Load Halfword Byte Reversed Indexed),
which is accessible from C through the __lhbrx intrinsic function. This function loads a 16-bit
value and swaps the low and high order bytes.
Network Communications
Cross-platform network communication is a must for any multiplayer game. Both the Windows
and Macintosh operating systems provide libraries for multiplayer gaming (DirectPlay on
Windows and NetSprocket on Macintosh), but both are platform-specific and do not allow for
cross-platform networking. These libraries are thus useless to the engineer wishing to
implement cross-platform multiplayer games. The only alternative is to code directly to the low-
level networking libraries. This section describes how to implement UDP (User Datagram
Protocol) communications, a part of the standard TCP/IP stack, using the WinSock library on
Windows and the OpenTransport library on the Macintosh.
The NetworkMgr class shown in Listing 3 demonstrates the initialization of WinSock and
OpenTransport. After starting up the networking library, the NetworkMgr constructor
determines the machine’s local IP address. The WinSock library does not provide a direct way of
retrieving this address, so we are forced to use the workaround of retrieving the local machine’s
name and then resolving that name’s IP address.
Actual communications occur through the NetworkEndpoint class shown in Listing 3. The code
given here implements fully asynchronous operation, meaning that data can be sent and
received at any time without dependency on a repetitively called servicing function. This is
accomplished through the use of a dedicated thread on Windows and an endpoint notification
function on the Macintosh.
The Windows version of the NetworkEndpoint constructor creates a UDP socket and binds it to
the machine’s local address. A new thread is also created which listens on two event objects.
The first event object is configured by the WSAEventSelect function to be set whenever new
data arrives (the FD_READ event) and whenever it becomes possible to send data (the FD_WRITE
event). The second event is just used to signal that the thread should exit because the
NetworkEndpoint class is being shut down. The dedicated thread sleeps until an event occurs,
so it has minimal impact on overall game performance.
The Send and Receive functions of the NetworkEndpoint class demonstrate how data packets
may be sent and received by a WinSock socket or an OpenTransport endpoint. These functions
each send or receive a single data packet. It is import to realize that any data that you expect
to send across the wire on a heterogeneous network needs to use a consistent byte order. The
official byte order of the internet is big endian, and in fact even WinSock requires that you
specify IP addresses in big endian byte order. WinSock provides functions such as htonl (Host
TO Network Long) which convert between the host byte order and the network byte order (big
endian).
Both WinSock and OpenTransport allow you to receive data packets which are larger than the
size of your receipt buffer by breaking the packet into multiple pieces. If you know what the
maximum size of your packets can be and set your receipt buffer to this size, then any larger
packet may be interpreted as invalid data, perhaps a hacker attempt to flood the connection.
The Receive function protects against this by implementing a simple oversized packet
mechanism which causes it to ignore all of the pieces of any packets that it receives that are
larger than the receipt buffer.
It is likely that you will want to preallocate space for a large number of data packets which can
later be used to send and receive data. Since these packets are going to be used
asynchronously by more than one thread of execution, they need to be protected by some kind
of synchronization mechanism such as a mutex. In the Windows environment, where every
thread can be preempted by any other thread, it is safe to wait indefinitely to acquire a mutex
inside the dedicated send/receive thread. This is not the case on the Macintosh, however, since
the notification function is actually part of an interrupt service routine and will never be
preempted by the main thread. On the Macintosh, you have no choice but to only attempt to
acquire a mutex from within the notification function. If that attempt fails, you must set a flag
indicating that a send or receive operating needs to take place at a later time.
Remember that UDP is an unreliable protocol. UDP datagrams are not guaranteed to reach
there destination, and when they do get through, are not guaranteed to arrive in the same
order in which they were sent. It is up to you to sort out the details of acknowledging the
receipt of packets, resending lost packets, and processing packets in the correct order.
Documentation for the WGL library, DirectX, and WinSock can all be found inside the Microsoft
Platform SDK, which can be downloaded at
http://msdn.microsoft.com/downloads/sdks/platform/platform.asp
Documentation for the AGL library can be found inside the Apple OpenGL SDK, which is
available at
http://developer.apple.com/opengl/
Documentation for the Macintosh Sound Manager is available online at
http://developer.apple.com/techpubs/mac/Sound/Sound-2.html
http://developer.apple.com/techpubs/mac/NetworkingOT/NetworkingWOT-2.html
http://developer.apple.com/games/sprockets/download.html
Eric Lengyel is currently a software engineer on the OpenGL team at Apple Computer.
Before joining Apple in 1999, he spent a few years in the game development industry
and continues to sacrifice sleep in order to code games in his spare time. Eric can be
reached at [email protected].
Modeling for computer games addresses the challenge of automating a variety of difficult
development tasks. An early milestone was the combination of geometric models and inverse
kinematics to simplify keyframing. Physical models for animating particles, rigid bodies,
deformable solids, fluids, and gases have offered the means to generate copious quantities of
realistic motion through dynamic simulation. Biomechanical modeling employs simulated
physics to automate the lifelike animation of animals with internal muscle actuators. Research
in behavioral modeling is making progress towards self-animating characters that react
appropriately to perceived environmental stimuli. It has remained difficult, however, to instruct
these autonomous characters so that they satisfy the programmer's goals. Hitherto absent in
this context has been a substantive apex to the computer graphics modeling pyramid (Figure
1), which we identify as cognitive modeling.
Cognitive models go beyond behavioral models, in that they govern what a character knows,
how that knowledge is acquired, and how it can be used to plan actions. Cognitive models are
applicable to instructing the new breed of highly autonomous, quasi-intelligent characters that
are beginning to find use in interactive computer games. Moreover, cognitive models can play
subsidiary roles in controlling cinematography and lighting. See the color plates at the end of
this article for some screenshots from two cognitive modeling applications.
We decompose cognitive modeling into two related sub-tasks: domain knowledge specification
and character instruction. This is reminiscent of the classic dictum from the field of artificial
intelligence (AI) that tries to promote modularity of design by separating out knowledge from
control.
Domain (knowledge) specification involves administering knowledge to the character about its
world and how that world can change. Character instruction involves telling the character to try
to behave in a certain way within its world in order to achieve specific goals. Like other
advanced modeling tasks, both of these steps can be fraught with difficulty unless developers
are given the right tools for the job.
Background
The situation calculus is the mathematical logic notation we will be using and it has many
advantages in terms of clarity and being implementation agnostic, but it is somewhat of a
departure from the repertoire of mathematical tools commonly used in computer graphics. We
shall therefore overview in this section the salient points of the situation calculus, whose details
are well-documented in the book [Funge99] and elsewhere [LRLLS97,LLR99]. It is also worth
mentioning that from a user's point of view the underlying theory can be hidden. In particular, a
user is not required to type in axioms written in first-order mathematical logic. In particular, we
have developed an intuitive high-level interaction language CML (Cognitive Modeling Language)
whose syntax employs descriptive keywords, but which has a clear and precise mapping to the
underlying formalism (see the book [Funge99], or website www.cs.toronto.edu/~funge, for
more details ).
The situation calculus is an AI formalism for describing changing worlds using sorted first-order
logic. A situation is a "snapshot" of the state of the world. A domain-independent constant s0
denotes the initial situation. Any property of the world that can change over time is known as a
fluent. A fluent is a function, or relation, with a situation term (by convention) as its last
argument. For example, Broken(x, s) is a fluent that keeps track of whether an object x is
broken in a situation s.
Primitive actions are the fundamental instrument of change in our ontology. The sometimes
counter-intuitive term "primitive" serves only to distinguish certain atomic actions from the
"complex", compound actions that we will defined earlier. The situation s' resulting from doing
action a in situation s is given by the distinguished function do, so that s' = do(a,s). The
possibility of performing action a in situation s is denoted by a distinguished predicate Poss
(a,s). Sentences that specify what the state of the world must be before performing some
action are known as precondition axioms. For example, it is possible to drop an object x in a
situation s, if and only if a character is holding it:
The effects of an action are given by effect axioms. They give necessary conditions for a fluent
to take on a given value after performing an action. For example, the effect of dropping a
fragile object x is that the object ends up being broken
Surprisingly, a naive translation of effect axioms into the situation calculus does not give the
expected results. In particular, stating what does not change when an action is performed is
problematic. This is called the "frame problem" in AI. That is, a character must consider
whether dropping a cup, for instance, results in, say, a vase turning into a bird and flying about
the room. For mindless animated characters, this can all be taken care of implicitly by the
programmer's common sense. We need to give our thinking characters this same common
sense. They need to be told that they should assume things stay the same unless they know
otherwise. Once characters in virtual worlds start thinking for themselves, they too will have to
tackle the frame problem. The frame problem has been a major reason why approaches like
ours have not previously been used in computer animation or until recently in robotics.
Fortunately, the frame problem can be solved provided characters represent their knowledge
with the assumption that effect axioms enumerate all the possible ways that the world can
change. This so-called closed world assumption provides the justification for replacing the effect
axioms with successor state axioms. For example, the following successor state axiom says
that, provided the action is possible, then a character is holding an object if and only if it just
picked up the object or it was holding the object before and it did not just drop the object:
Character Instruction
We distinguish two broad possibilities for instructing a character on how to behave: predefined
behavior and goal-directed behavior. Of course, in some sense, all of a character's behavior is
defined in advance by the animator/programmer. Therefore, to be more precise, the distinction
between predefined and goal-directed behavior is based on whether the character can
nondeterministically select actions or not.
A character that can nondeterministically select actions is usually a lot easier to instruct, but
has a slower response time. In particular, we can tell a cognitive character what constitutes a
"desirable outcome" by giving it goals, and it can then use its background domain knowledge to
figure out whether it believes a given action sequence will achieve those goals or not. Although
we are using the word "nondeterministic" in a precise technical sense, the trade-off between
execution speed and programming effort should already be a familiar and intuitive concept for
many readers.
A third possibility we will consider is something of a compromise between the two extremes of
predefined and goal-directed behavior. In particular, we introduce the notion of complex actions
and explain how they can be used to provide goals, and a "sketch plan" for how to achieve
those goals.
Before we continue, it is worth pointing out that sometimes people identify a particular class of
programming languages with a particular kind of behavior. For example, logic programming
languages are often associated with nondeterministic goal-directed behavior, and regular
imperative languages with deterministic predefined behavior. While it is true that logic
programming languages have built-in support for nondeterministic programming, there is
nothing to stop us implementing either kind of behavior in any programming language we
choose (assuming it is Turing complete). To avoid unnecessary confusion, we shall not tie the
following discussion to any particular programming languages.
Predefined Behavior
There are many convenient techniques we can use to predefine a character's behavior. In this
article, however, we are more interested in techniques for which the character's behavior is not
completely determined in advance. Therefore, we shall not attempt a comprehensive survey of
techniques for predefining behavior. Instead, we shall take a brief look at two particularly
popular approaches: reactive behavior rules, and hierarchical finite-state machines (HFSM).
Reactive Behavior Rules
We will use the term reactive behavior when a character's behavior is based solely on its
perception of the current situation. What we mean by this is that the character has no memory
of previous situations it has encountered. In particular, there is no representation of its own
internal state and so it will always react in the same way to the same input stimuli, regardless
of the order in which the inputs are received. A simple way to encode reactive behavior is as a
set of stimulus-response rules. This has a number of important advantages:
z Although the set of rules might be short, and each of the rules very simple, that doesn't
necessarily mean the behavior that results from the character following the rules is simple
at all. That is, we can often capture extremely sophisticated behavior with some simple
rules.
z We can usually evaluate the rules extremely quickly so there should be no problem
obtaining real-time response from our characters.
z There is no need to worry about various knowledge representation issues that arise when
characters start thinking for themselves. That is, the characters are not doing any
thinking for themselves; we have done it all for them, in advance.
The use of reactive behavior rules was also one of the first approaches proposed for generating
character behaviors, and it is still one of the most popular and commonplace techniques. Great
success has been obtained in developing rule sets for various kinds of behavior, such as flocking
and collision avoidance. As an example of a simple stimulus-response rule that can result in
extremely sophisticated behavior, consider the following rule:
Believe it or not, this simple "left-hand rule" will let a character find its way through a maze. It
is an excellent example of how one simple little rule can be used to generate highly complex
behavior. The character that follows this rule doesn't need to know it is in a maze, or that it is
trying to get out. It blindly follows the rule and the maze-solving ability simply "emerges".
Someone else did all the thinking about the problem in advance and managed to boil the
solution down to one simple instruction that can be executed mindlessly. This example also
shows how difficult thinking up these simple sets of reactive behavior rules can be. In
particular, it is hard to imagine being the one who thought this rule up in the first place, and it
even requires some effort to convince oneself that it works.
We can thus see that despite some of the advantages, there are also some serious drawbacks
to using sets of reactive behavior rules:
z The biggest problem is thinking up the correct set of rules that leads to the behavior we
want. It can require enormous ingenuity to think of the right set of rules and this can be
followed by hours of tweaking parameters to get things exactly right.
z The difficult and laborious process of generating the rules will often have to be repeated,
at least in part, every time we want to effect even a slight change in the resulting
behavior.
z Since the behavior rules are deterministic, once an action is chosen, there is no way to
reconsider the choice. There are many cases when a cognitive character could use its
domain knowledge to quickly anticipate that an action choice is not appropriate. An
autonomous character has no ability to make such judgments and, regardless of how
appropriate it is, must blindly follow the predefined behavior rules that pertain to the
current situation.
z When there are many rules it is quite likely their applicability will overlap and they could
give conflicting suggestions on which action to choose. In such cases some conflict
resolution strategy must be employed.
It is often easier to write a controller if we can maintain some simple internal state information
for the character. One popular way to do this is with HFSM that we discuss in the next section.
Finite-state machines (FSMs) consist of a set of states (including an initial state), a set of
inputs, a set of outputs, and a state transition function. The state transition function takes the
input and the current state and returns a single new state and a set of outputs. Since there is
only one possible new state, FSMs are used to encode deterministic behavior. It is
commonplace, and convenient, to represent FSMs with state transition diagrams. A state
transition diagram uses circles to represent the states and arrows to represent the transitions
between states. Figure 2 depicts an FSM that keeps track of which compass direction a
character is heading each time it turns "left".
As the name implies, an HFSM is simply a hierarchy of FSMs. That is, each node of an HFSM
may itself be an HFSM. Just like functions and procedures in a regular programming language,
this provides a convenient way to make the design of an FSM more modular. For example, if a
character is at coordinates (x,y), Figure 3 depicts an HFSM that uses the FSM in Figure 2 as a
sub-module to calculate the new cell after turning "left", or moving one cell ahead.
Figure 3. HFSM that uses the WhichDir FSM
HFSMs are powerful tools for developing sophisticated behavior and it is easy to develop
graphical user interfaces to assist in building them. This has made them a popular choice for
animators and game developers alike.
HFSMs maintain much of the simplicity of sets of reactive-behavior rules but, by adding a notion
of internal state, make it easier to develop more sophisticated behaviors. Unfortunately, they
also have some of the same drawbacks. In particular, actions are chosen deterministically and
there is no explicit separation of domain knowledge from control information. This can lead to a
solution which is messy, hard to understand and all but impossible to maintain. Just like
reactive-behavior rules, there can also be a large amount of work involved if we want to obtain
even slightly different behavior from an HFSM.
Goal-directed Behavior
The first step in describing goal-directed behavior is to come up with a way to define a cognitive
character's goals. The situation calculus provides a simple and intuitive theoretical framework to
explain how this can be done. In particular, a character's goals can be expressed in terms of the
desired value of various relevant fluents. A goal can therefore be expressed as a defined fluent,
i.e., a fluent defined in terms of other fluents. For example, suppose we have two characters,
call them Dognap and Jack, such that Dognap is armed with a gun, and wants to kill Jack. Then,
Clearly, Dognap will have achieved this goal in any situation s' for which is goal(s') true. We
recall that any situation is either the initial situation s0, or of the form:
Therefore, if goal(s0) is not true, then Dognap must search for a sequence of n actions,
a0,...,an-1 such that
is true.
Situation Tree
To explain how characters can automatically search for sequences of actions that meet their
goals, we will introduce the idea of a situation tree. In particular, we can think of the actions
and effects as describing a tree of possible future situations. The root of the tree is the initial
situation s0, each branch of the tree is an action, and each node is a situation. Figure 4 shows
an example of a tree with n actions, a0,a1...,an-1.
The value of the fluents at each node (situation) is determined by the effect axioms. Figure 5
shows a simple concrete example using the Dognap and Jack example, and the corresponding
effect axioms, that we described earlier.
Figure 5 A concrete example of a situation tree. A goal situation is a situation in which the
fluent is true. For example, in Figure 5 we can see that if the goal is still to kill Jack then the
situation
is a goal situation. We can see that in this example there are many goal situations, for example
is another goal situation. In general, however, there is no guarantee that a goal situation exists
at all. If a goal situation does exist, then any action sequence that leads to one of the goal
situations is called a plan.
Figure 6 shows a simple abstract situation tree with just three actions, and three goal
situations. We will use this figure to illustrate how a character can search the tree to
automatically find a plan (a path) that leads from the initial situation (the root) to a goal
situation. Depending on how we choose to search the tree we will find different plans (paths).
In particular, we can see some common search strategies being applied. We can see that a
bounded depth-first search strategy finds the plan [a0,a2,a0], whereas a breadth first search
finds [a1,a2].
A breadth-first search tries exhaustively searching each layer of the tree before proceeding to
the next layer. That is, it considers all plans of length 0, then all plans of length 1, etc. Thus, a
breadth-first search is guaranteed to find a plan if there is one. Moreover it will find the shortest
such plan. Unfortunately, a breadth-first search requires an exponential amount of memory as
the character has to remember all the previous searches.
In the worst case, the situation tree does not contain any goal situations. If this is the case,
then any exhaustive search algorithm will take an exponential amount of time to respond that
there is no plan available to achieve the goal. This is one of the major limitations of planning
and is something we will look at in more detail in the next section. In the meantime, we
mention that looking for different search algorithms is an important topic in AI research and the
interested reader should consult the further reading section. One of the most interesting new
developments is the use of stochastic search algorithms.
It should also now be apparent how choosing actions nondeterministically entails searching for
appropriate action sequences in a search space that potentially grows exponentially. This
corresponds to the usual computer science notion of computational complexity. Another
interesting point to note is that CPU processing power is also growing exponentially. Therefore,
according to Moore's law, our computer characters can be expected to be able to search one
layer deeper in the situation tree every eighteen months or so.
As we explained, for predefined behaviors the character doesn't have to do any searching for
actions that achieve its goals. It simply follows the instructions it was given and ends up at a
goal situation. In effect, for a given set of inputs, the path through the tree of possible
situations has been determined in advance. If the predefined behaviors were defined properly,
then the path that they specify through the tree will lead to a goal situation.
In this section, the question we want to ask is whether there is some middle ground between
asking the character to do all the work at run-time and asking the programmer to all the work
at compile time. In particular, consider that on the one hand we have predefined behavior
which corresponds to a single path through the situation tree, and on the other hand we have
goal-directed behavior which corresponds to searching the whole tree. Clearly, the middle
ground has to be searching some subset of the tree.
Note that this "middle ground" is still technically goal-directed behavior, but we now have
control over how much nondeterminism is allowed in the behavior specification. Only in the
limiting case, when we have removed all the nondeterminism, does the behavior reduce to
deterministic predefined behavior.
Precondition Axioms
Although we might not have realized it, we have already seen one way to exclude parts of the
situation tree from the search space. In particular, precondition axioms prune off whole chunks
of the tree by stating that not all actions are possible in all situations. Figure 7 shows an
example of an abstract tree in which it is not possible to do an action a2 because an action a1
changed something which made it impossible.
Figure 7. Preconditions preclude portions of the tree
While preconditions are important for cordoning off parts of the situation tree, they are a
clumsy way to try and coerce a character to search a particular portion of the tree. In
particular, we need a way to give a character general purpose heuristics to help it find a goal
faster. For example, we might want to give the character a heuristic that will cause it look at
certain groups of actions first, but we do not want to absolutely exclude the other actions.
The hard part of exploiting the middle ground between predefined and goal-directed behavior is
to think up a useful way to specify subsets of the tree. In the next section, we will introduce a
convenient way to specify arbitrary subsets of the situation tree to search.
Complex Actions
We would like to provide a character with a "sketch plan" and have it responsible for filling in
the remaining missing details. In this way, we salvage some of the convenience of the planning
approach while regaining control over the complexity of the planning tasks we assign the
character. We will show how we can use the idea of complex actions to write sketch plans.
The actions we discussed previously, defined by precondition and effect axioms, are referred to
as primitive actions. (The term "primitive action" is only meant to indicate an action is an
atomic unit, and not a compound action. Unfortunately, the term can be misleading when the
action actually refers to some sophisticated behavior, but we will stick with the term as it is
widely used in the available literature). Complex actions are abbreviations for terms in the
situation calculus; they are built up from a set of recursively defined operators. Any primitive
action is also a complex action. Other complex actions are composed using various operators
and control structures, some of which are deliberately chosen to resemble a regular
programming language. When we give a character a complex action a, there is a special macro
Do that expands a out into terms in the situation calculus. Since complex actions expand out
into regular situation calculus expressions, they inherit the solution to the frame problem for
primitive actions.
Complex actions are defined by the macro Do(a,s,s'), such that is a state that results from
doing the complex action a in state s. The complete list of operators for the (recursive)
definition of Do are given below. Together, the operators define an instruction language we can
use to issue direction to characters. The mathematical definitions can be difficult to follow, and
the reader is encouraged to consult the book [Funge99], in which we explain the basic ideas
more clearly using numerous examples of complex actions (note there are two freely available
implementations of complex actions that can be studied for a more practical insight into how
the macro expansion works--see www.cs.toronto.edu/~funge/book).
Figure 8. Effect of the complex action on a situation tree
The macro expansion Do(a,s,s') specifies a relation between two situations s and s', such that is
a situation that results from doing the complex action a in situation s. In general, there is not a
unique s', so if we have some initial situation s0, a complex action "program", and a bunch of
precondition and effect axioms, then Do(program, s0, s') specifies a subset of the situation
tree. Figure 8 shows a quick example of how a complex action can be used to limit the search
space to some arbitrary subset of the situation tree. The other thing we can see from the figure
is that the mathematical syntax can be rather cryptic. Therefore, in the appendix, we introduce
some alternative syntax for defining complex actions that is more intuitive and easy to read.
On its own, just specifying subsets of the situation tree is not particularly useful. Therefore, we
would normally explicitly mention the goal within the complex action. We shall see many
examples of this in what follows. For now, suppose the complex action "program" is such a
complex action. If we can find any
such that Do(program, s0, s'), then the plan of length n, represented by the actions a0,...,an-1.
, is the behavior that the character believes will result in it obtaining its goals. Finding such an
s' is just a matter of searching the (pruned) situation tree for a suitable goal situation. Since we
still end up searching, research in planning algorithms is just as relevant to this section as to
the straight goal-directed specification section.
Implementation
Note that we defined the notion of a situation tree to help us visualize some important ideas.
We do not mean to suggest that in any corresponding implementation that there need be
(although, of course, there may be) any data structure that explicitly represents this tree. In
particular, if we explicitly represent the tree, then we need a potentially exponential amount of
memory. Therefore, it makes more sense to simply build portions of the tree on demand, and
delete them when they are no longer required. In theorem provers and logic programming
languages (e.g., Prolog), this is exactly what happens continually behind the scenes.
We already looked at some predefined behavior for solving a maze. Let's take a look at a goal-
directed approach to the problem. Of course, since there are well-known predefined behaviors
for maze solving, we would not suggest using a goal-directed approach in a real application.
Therefore, this section is simply meant as a tutorial example to show how some of the different
pieces fit together.
Domain Knowledge
Let us suppose we have a maze defined by a predicate Free(c), that holds when, and only
when, the grid cell c is "free". That is, it is within range and is not occupied by an obstacle.
Occupied(c), sizex, and sizey each depend upon the maze in question. In addition, there are
two maze dependent constants start and exit that specify the entry and exit points of a maze.
Figure 9 shows a simple maze and the corresponding definition.
We also need to define some functions that describe a path within the maze. We say that the
adjacent cell "North" of a given cell is the one directly above it, similarly for "South", "East",
and "West".
There are two fluents; position denotes which cell contains the character in the current
situation, and visited denotes the cells the character has previously visited.
The single action in this example is a move action that takes one of four compass directions as
a parameter. It is possible to move in some direction d, provided the cell to which we are
moving is free and has not been visited before.
Figure 10 shows the possible directions a character can move when in two different situations.
A fluent is completely specified by its initial value and its successor-state axiom. For example,
the initial position is given as the start point of the maze and the effect of moving to a new cell
is to update the position accordingly.
So for example, in Figure 9, if the character has previously been to the locations marked with
the filled dots, and in situation the character moves north to the unfilled dot, then we have that
position(s) = (2,0) and that position(do(move(north),s)=(2,1).
The list of cells visited so far is given by the defined fluent . It is defined recursively on the
situation to be the list of all the positions in previous situations (we use standard Prolog list
notation).
Character Instruction
We have now completed telling the character everything it needs to know about the concept of
a maze. Now we need to move on and use complex actions to tell it about its goal and any
heuristics that might help it achieve those goals. As a first pass, let's not give it any heuristics,
but simply provide a goal-directed specification of maze-solving behavior. Using complex
actions we can express this behavior elegantly as follows:
Just like a regular "while" loop, the above program expands out into a sequence of actions.
Unlike a regular "while" loop, it expands out, not into one particular sequence of actions, but
into all possible sequences of actions. The precondition axioms that we previously stated, and
the exit condition of the loop, define a possible sequence of actions. Therefore, any free path
through the maze, which does not backtrack and ends at the exit position, meets the behavior
specification.
Note that the use of regular programming constructs may initially cause confusion to the reader
of the above code. Most of the work is being done by the nondeterministic choice of arguments
operator " ". The example makes it clear that by "nondeterministic" we do not mean that
anything random is happening; we simply mean that we can specify a large number of
possibilities all at once. In particular, the ( d) construct should be read as "pick the correct
direction d". For the mathematically inclined, perusing the definitions may serve to alleviate any
sense of bewilderment. To make things even clearer we shall, however, consider the expansion
of the complex actions in terms of their definitions. The expansion is based on the simple maze
described previously in Figure 9.
In the initial situation we have . Thus the guard of the "while" loop holds and
we can try to expand
However, from the action preconditions for and the definition of the maze we can see that:
This leaves us with s = do(move(north), s0) V s = do(move(east), s0).That is, there are two
possible resulting situations. That is why we refer to this style of program as nondeterministic.
Although we disallow backtracking in the final path through the maze, the character may use
backtracking when it is reasoning about valid paths. In most of the mazes we tried, the
character can reason using a depth-first search to find a path through a given maze quickly. For
example, Figure 12 shows a path through a reasonably complicated maze that was found in a
few seconds.
To speed things up, we can start to reduce some of the nondeterminism by giving the character
some heuristic knowledge. For example, we can use complex actions to specify a "best-first"
search strategy. In this approach, we will not leave it up to the character to decide how to
search the possible paths, but constrain it to first investigate paths that head toward the exit.
This requires extra lines of code, but could result in faster execution.
For example, suppose we add an action goodMove(d), such that it is possible to move in a
direction d if it is possible to "move" to the cell in that direction and the cell is closer to the goal
than we are now.
Now we can rewrite our high-level controller as one that prefers to move toward the exit
position whenever possible.>
At the extreme, there is nothing to prevent us from coding in a simple deterministic strategy
such as the "left-hand" rule. For example, if we introduce a defined fluent dir that keeps track
of the direction the character is traveling, and a function ccw that returns the compass direction
counterclockwise to its argument, then the following complex action implements the left-hand
rule.
The important point is that using complex actions does not rule out any of the algorithms one
might consider when writing the same program in a regular programming language. Rather, it
opens up new possibilities for high-level specifications of behavior at a cognitive level of
abstraction.
Discussion
Complex actions provide a convenient tool for giving a character "advice" in the form of
heuristic rules that will help it solve problems faster. In general, the search space will still be
exponential, but reducing the search space can make the difference between a character that
can plan 5 steps ahead, say, and one that can plan 15 steps ahead. That is, we can get
characters that appear a lot more intelligent.
The possibility also exists for incremental refinement of a specification, perhaps, from a high-
level specification to the point where it more closely resembles a controller written using a
conventional imperative programming language. That is, we can quickly create a working
prototype by relying heavily on goal-directed specification. If this prototype is too slow, we can
use complex actions to remove more and more of the nondeterminism. If required, we can even
do this to the point where the behavior is completely predefined.
To sum up, if we can think of, or look up, a simple predefined way to produce the behavior we
are interested in, then it makes a lot of sense to use it. This is especially so if we don't think the
behavior will need to be modified very often, or at least if the anticipated modifications are
minor ones. It is not surprising, therefore, that a lot of simple reactive behavior is implemented
using simple reactive behavior rules. For simple reactive behavior, like collision avoidance, it is
not hard to think of a small set of reactive behavior rules that will do the job. Moreover, once
we have this set of rules working, it is unlikely that we will need to modify it.
We have tried to make it clear that one type of behavior can be implemented using a variety of
techniques. We have, therefore, chosen not to classify behavior according to what the character
is trying to achieve, but rather on the basis of the technique used to implement it. The reader
should note however that some others do try to insist that behavior in the real world is of a
certain type, and its virtual world counterpart must therefore be implemented in a particular
way. Unfortunately, this leads to lots of confusion and disagreement among different research
camps. In particular, there are those who advocate using predefined behavior rules for
implementing every kind of behavior, no matter how complex. In the sense that, given enough
time and energy it can be done, they are correct. However, they are somewhat like the
traditional animator who scoffs at the use of physical simulators to generate realistic-looking
motion. That is, to the traditional animator a physical simulator is an anathema. She has an
implicit physical model in her head and can use this to make realistic motion that looks just as
good (if not better), and may only require the computer to do some simple "inbetweening".
Compared to the motion that needs a physical simulator to execute, the key-framed approach is
lightning fast. If we could all have the skill of a professional animator there would not be so
much call for physical simulators. Unfortunately, most of us do not have the skill to draw
physically-correct looking motion and are happy to receive all the help we can get from the
latest technology. Even artists who can create the motion themselves might prefer to expend
their energies elsewhere in the creative process.
In the same vein, many of us don't have any idea of how to come up with a simple set of
stimulus-response rules that implement some complex behavior. Perhaps, we could eventually
come up with something, but if we have something else we'd rather do with our time it makes
sense to get the characters themselves to do some of the work for us. If we can tell them what
we want them to achieve, and how their world changes, then perhaps they can figure it out for
themselves.
We should also point out that there are those who advocate a cognitive modeling approach for
every kind of behavior, even simple reactive ones. This view also seems too extreme as, to coin
a phrase, there is no point "using a sledgehammer to crack a nut". If we have a simple reactive
behavior to implement, then it makes sense to look for a simple set of predefined rules. Also, if
lightning-fast performance is an absolute must, then we might be forced to use a predefined
approach, no matter how tough it is to find the right set of rules.
Of course, there is a big gray area in which there is no clear answer as to whether we should
just stick with predefined behavior rules or not. In such cases, the choice of how to proceed can
depend on personal preference and the available tools and expertise. Obviously, this article is
primarily aimed for those who decide to go the cognitive modeling route.
Discussion
Complex actions provide a convenient tool for giving a character "advice" in the form of
heuristic rules that will help it solve problems faster. In general, the search space will still be
exponential, but reducing the search space can make the difference between a character that
can plan 5 steps ahead, say, and one that can plan 15 steps ahead. That is, we can get
characters that appear a lot more intelligent.
The possibility also exists for incremental refinement of a specification, perhaps, from a high-
level specification to the point where it more closely resembles a controller written using a
conventional imperative programming language. That is, we can quickly create a working
prototype by relying heavily on goal-directed specification. If this prototype is too slow, we can
use complex actions to remove more and more of the nondeterminism. If required, we can even
do this to the point where the behavior is completely predefined.
To sum up, if we can think of, or look up, a simple predefined way to produce the behavior we
are interested in, then it makes a lot of sense to use it. This is especially so if we don't think the
behavior will need to be modified very often, or at least if the anticipated modifications are
minor ones. It is not surprising, therefore, that a lot of simple reactive behavior is implemented
using simple reactive behavior rules. For simple reactive behavior, like collision avoidance, it is
not hard to think of a small set of reactive behavior rules that will do the job. Moreover, once
we have this set of rules working, it is unlikely that we will need to modify it.
We have tried to make it clear that one type of behavior can be implemented using a variety of
techniques. We have, therefore, chosen not to classify behavior according to what the character
is trying to achieve, but rather on the basis of the technique used to implement it. The reader
should note however that some others do try to insist that behavior in the real world is of a
certain type, and its virtual world counterpart must therefore be implemented in a particular
way. Unfortunately, this leads to lots of confusion and disagreement among different research
camps. In particular, there are those who advocate using predefined behavior rules for
implementing every kind of behavior, no matter how complex. In the sense that, given enough
time and energy it can be done, they are correct. However, they are somewhat like the
traditional animator who scoffs at the use of physical simulators to generate realistic-looking
motion. That is, to the traditional animator a physical simulator is an anathema. She has an
implicit physical model in her head and can use this to make realistic motion that looks just as
good (if not better), and may only require the computer to do some simple "inbetweening".
Compared to the motion that needs a physical simulator to execute, the key-framed approach is
lightning fast. If we could all have the skill of a professional animator there would not be so
much call for physical simulators. Unfortunately, most of us do not have the skill to draw
physically-correct looking motion and are happy to receive all the help we can get from the
latest technology. Even artists who can create the motion themselves might prefer to expend
their energies elsewhere in the creative process.
In the same vein, many of us don't have any idea of how to come up with a simple set of
stimulus-response rules that implement some complex behavior. Perhaps, we could eventually
come up with something, but if we have something else we'd rather do with our time it makes
sense to get the characters themselves to do some of the work for us. If we can tell them what
we want them to achieve, and how their world changes, then perhaps they can figure it out for
themselves.
We should also point out that there are those who advocate a cognitive modeling approach for
every kind of behavior, even simple reactive ones. This view also seems too extreme as, to coin
a phrase, there is no point "using a sledgehammer to crack a nut". If we have a simple reactive
behavior to implement, then it makes sense to look for a simple set of predefined rules. Also, if
lightning-fast performance is an absolute must, then we might be forced to use a predefined
approach, no matter how tough it is to find the right set of rules.
Of course, there is a big gray area in which there is no clear answer as to whether we should
just stick with predefined behavior rules or not. In such cases, the choice of how to proceed can
depend on personal preference and the available tools and expertise. Obviously, this article is
primarily aimed for those who decide to go the cognitive modeling route.
Notes
For some basic information on FSMs see [HU79]. For more in-depth information on predefined
behavior techniques, consult [Maes90,BBZ91,Tu99]. There are even some commercial character
development packages that use HFSMs to define character behavior. See [Nayfeh93] for a
fascinating discussion on maze-solving techniques. Many of the classic papers on planning can
be found in [AHT90]. See [SK96] for some work on the use of stochastic techniques for
planning. Prolog is the best known nondeterministic programming language and there are
numerous references, for example see [Bratko90].
The complex action macro expansion is closely related to work done in proving properties of
computer programs [GM96]. Our definitions are taken from those given in [LRLLS97]. A more
up-to-date version, that includes support for concurrency, appears in [LLR99]. See [Stoy77] for
the Scott-Strackey least fixed-point definition of (recursive) procedure execution.
References
[AHT90] J. Allen, J. Hendler, and A. Tate, editors. Readings in Planning. Morgan Kaufmann, San
Mateo, CA, 1990.
[BBZ91] N.I. Badler, B.A. Barsky, and D.Zeltzer, editors. Making them move: mechanics,
control, and animation of articulated figures. Morgan Kaufmann, San Mateo, 1991.
[Bratko90] I. Bratko. PROLOG Programming for Artificial Intelligence. Addison Wesley, Reading,
MA, 1990.
[Funge99] J. Funge. AI for Games and Animation: A Cognitive Modeling Approach. A. K. Peters.
Natick, MA, 1999.
[GM96] J. A. Goguen and G. Malcolm. Algebraic Semantics of Imperative Programs. MIT Press,
Cambridge, MA, 1995.
[Maes90] P. Maes (editor). Designing Autonomous Agents: Theory and Practice from Biology to
Engineering and Back. MIT Press, Boston, 1990.
[Nayfeh93] B. A. Nayfeh. Using a cellular automata to solve mazes. Dr. Dobb's Journal,
February 1993.
[SK96] B. Selman and H. Kautz. Knowledge compilation and theory approximation. Journal of
the ACM, 43(2):193-224, 1996.
[Tu99] X. Tu. Artificial Animals for Computer Animation: Biomechanics, Locomotion, Perception,
and Behavior. ACM Distinguished Ph.D Dissertation Series, Springer-Verlag, 1999.
There is absolutely no doubt that 3D games are the coolest thing on the PC today. Anyone who
has played computer games recently, or even watched someone else play them, has surely
marveled at the realistic, beautifully fluid images and smooth animations produced by high-
performance PC's running current-generation game titles.
A great deal of technical innovation is spurred by the desire to advance interactive 3D gaming,
both in the hardware and software R&D labs. Many PC hardware innovations, from CPUs to
graphics chips to input peripherals, are a direct result of the needs of the game development
community and their customers. Each new generation of hardware gives developers an excuse
to push the platform further, driving it towards the breaking point.
While game developers forge ahead into new arenas, creating new and imaginative experiences
for increasingly demanding audiences, they face multiple challenges from every direction. Of
course, getting all this innovative 3D graphics stuff to work in real time at all is hard. But if that
isn't enough of a problem, developers face another difficulty: dealing with the issue of
scalability.
Gamers, and in turn, the game media and publishers, have come to expect that The Next Great
Game will contain The Next Great Thing, when it comes to graphics and simulation of the 3D
environment. This would be fine except that the gamers and publishers also want to see The
Next Great Thing running on a two-year old PC, so that all the gamers of the world can play it.
Developers have to make their games run on lower-performance systems to reach the widest
possible audience, while still delivering a package that will keep the "lunatic fringe" gamers
entertained and the press properly awestricken.
As long as the PC has existed, developers have written games designed to run on systems of
widely varying performance. Scalability problems, however, have increased in recent years. For
one thing, the range of performance from low end to high-end machines is increasing. This is
not only due to increases in CPU clock speeds, but also factors like the introduction of expanded
CPU instruction sets, aimed at enhancing 3D performance. Another factor is the monumental
leap in graphics performance that has been achieved by the graphics hardware companies.
All this leaves game developers with the challenge of coming up with new ways to scale their
games across that wide range of performance. We have no doubt that the developers are up to
the challenge, particularly when the PC industry is doing its share to help them out. Out of this
collaborative mix of minds, new techniques and algorithms to scale content have been born.
Some of these new approaches are discussed in this article.
Scaling the content itself is not so much defined by any one particular algorithm, but rather the
concept of adding or modifying characters, objects and scenery dynamically. Done properly, this
can add to the believability of the environment without affecting gameplay or performance. The
following examples llustrate this technique.
Imagine a medieval adventure game in which the player is walking a forest path on a sunny
afternoon, enjoying the sights and sounds of the serene forest. Suddenly, a troll jumps out from
behind a tree and attacks. In the interest of delivering consistent gameplay, the troll should
attack with the same speed and degree of ferocity on both a low- or high-end system.
Accordingly, the content must be designed to run on a low-end system. Using a scalable
approach to design, however, the developer could include richer features designed to run on
higher performance systems, such as birds flying through the trees, and squirrels running about
the forest floor. These elements make the forest environment more believable, and create a
more immersive experience, on systems that can handle the extra work, but these extra
elements don't change the fact that a troll is trying to lop the player's head off.
This concept of 'ambient characters' has been seen in some titles already, like the 'space
rodents' or flying creatures in Epic's Unreal, or the small maintenance robots in Lucas Arts' Jedi
Knight. While these examples included the ambient characters on all systems, as the
performance of high-end systems increases, developers have the opportunity to selectively
incorporate elements like these on high-end systems to make their worlds more believable.
Flight simulators offer another example of how content can be scaled to the performance of a
system. Imagine flying low over Broadway in Manhattan (hey, that's what flight sims are all
about, right?).On the high-end systems, for example, the pilot would be able to see traffic lights
at intersections. The developer, working from map data for the city, could have an artist model
a single traffic light, and then algorithmically place the lights at every intersection. This
approach would be possible only if the necessary system bandwidth is available. Same
algorithms could selectively add mailboxes, parking meters, and perhaps even shrieking
pedestrians. This sort of 'procedural content placement' straddles the line between scaling the
game content and scaling the geometry used to represent the game characters and objects.
Scaling Geometry
The scaling of geometry is something that developers have been doing as long as there have
been 3D games on the PC. A typical approach involves storing several versions of every object
model, each at a different level of detail (LOD). The application swaps down to lower level of
detail models as necessary, based on user selections or other criteria, such as the number of
objects moving about in the scene or changes in the frame rate.
While "discrete LOD's" are commonly used, this approach has some drawbacks. One problem is
that an artist has to model every one of the objects at multiple levels of detail, which is a long
and tedious process. Each of these objects must then be stored on the disc, consuming space.
Finally, switching from one level of detail to another can result in a visual glitch, referred to as
'popping', appearing onscreen. With these problems in mind, developers are exploring a
number of other approaches to scaling geometry content.
Scaling the geometry content involves adding or removing polygons to the scene to maintain a
sufficient frame rate or level of visual quality. Systems with higher performance processors
and/or 3D accelerators will be able to pump more polygons to the display resulting in higher
quality images. Several techniques currently exist for creating scalable geometry and more are
likely to appear.
A computer graphics technique commonly used in higher end applications, but that can be used
for creating scalable geometry in today's consumer applications, is to use parametric curves
and surfaces. A parametric curve can best be described with an example. Assume that you are
given four toothpicks and told to make a circle. First you'll probably put the toothpicks in a
diamond shape in front of you, which is a rough approximation of a circle, but very rough. If
you are given five, six, or fifty toothpicks, you can make shapes that more closely approximate
a circle, but none would be exact (assuming that the size of the circle doesn't matter). If the
size must stay the same, assume the toothpicks have the magical ability to stretch or shrink to
arbitrary lengths. There would still be regions of flatness on the shape that are not typical of a
"real" circle. Now, suppose you are given a piece of string and told to make a circle. Depending
on your patience, you could create a circle of arbitrary precision.
This is similar to the way parametric surfaces relate to polygonal models. Suppose we're trying
to model a wheel. With a polygonal model, you work with a bunch of triangles that are used to
approximate the smooth, curved portions of the wheel. The approximation can be made more
exact by using more, smaller triangles, but, as in the toothpick example, there will still be areas
of flatness that don't exist on a "real" wheel. Using parametric surfaces, we work from a
mathematical description of the wheel (for example, a center point and a radius and a width).
Using the mathematical description, you can compute the precise points required to create the
wheel.
During rendering of the image, the mathematical description of the object is used to generate
an arbitrary number of triangles (because 3D accelerators are designed to handle triangles).
Depending on the factors we've discussed (such as CPU, 3D accelerator, frame rate, and so on),
you can selectively generate more or fewer triangles on the fly. Figure 1 shows an example of a
parametric curve used to generate different numbers of triangles according to performance.
Let's go back to the toothpick example used earlier to illustrate the technique of progressive
meshes. Suppose you're initially given a thousand toothpicks. You can use all one thousand
toothpicks to make the circle and get pretty nice results. However, suppose you're told that you
only have ten seconds to complete the circle. Within this limited time, you can only use some of
the toothpicks, but you can still make a decent approximation of the circle. If given twenty
seconds to make the circle, you could theoretically use twice as many toothpicks as you did for
the ten-second circle. The quality of the circle you can make depends on the time you have to
make it-up to the limit of your original thousand toothpicks.
This illustrates the basic concept of progressive meshes. The idea is that a 3D modeler creates
a highly detailed version of a 3D object. Then, using the progressive mesh algorithm,
information that describes how to remove triangles from the model, one-by-one, is created.
This information can then be used at run-time to create models that include an arbitrary
numbers of polygons (up to the maximum number of polygons originally used to create the
model).
Progressive meshes enable game developers to spend less time creating artwork, because the
artists don't have to create multiple levels of detail for each model in the game. Additionally,
dynamic run-time decisions can be made, determining how many triangles to use to create the
model, based on the available CPU, 3D accelerator, and other system characteristics.
A final technique for scaling the geometry of objects in a scene, referred to as subdivision
surfaces, has recently gained popularity in the 3D graphics field. Returning to our toothpick
example, assume that you're given four toothpicks and told to approximate a circle (sound
familiar?). The obvious first attempt is to create the diamond shape that really doesn't look
much like a circle. In a burst of creativity, you break each of the toothpicks in half. With eight,
smaller toothpicks, you can make a better approximation of the circle. If you keep subdividing
(breaking) the toothpicks, until you have sixteen, thirty-two, or more, you can create more
accurate circles each time you repeat the process. Progressing to infinity, you would eventually
end up with a precise circle.
This example illustrates the technique of subdivision surfaces. You start with a polygonal mesh
that represents an object. Then, each triangle contained in the mesh is broken into four pieces
that are repositioned to more closely approximate the desired object. The process can be
repeated to create arbitrarily smooth, curved surfaces. Artists must contend with the challenge
of creating an appropriate initial mesh that will subdivide neatly to produce the correct desired
mesh. Fortunately, this process is well supported by some of the major modeling packages.
Scaling the geometry content in each scene definitely helps scale the application across a wide
range of systems, but other approaches to managing scalability also exist. In a fast-moving
game, developers not only have to put those polygons on screen, but they need to move them
around, too. The animation process is another area in which content can be scaled.
Many 3D games today use pre-stored animations to move the characters and objects in a
scene. An artist positions the model (for example, a human character) in all of the poses
necessary to create an animation sequence. For example, consider a 3D character that needs to
be able to run, jump, and swing a sword. The artist first creates a collection of poses for each of
these animated movements. This is typically done much like the way we've all created simple
2D animations, using a pencil and a pad of paper. On each page of the pad, you create the next
"step" of the animation; when you flip between the pages rapidly, the animation comes alive.
When performing 3D animation, the game code typically cycles between the 3D models that
represent the animation (sometimes moving the object in space at the same time, such as
when a character is running or jumping). The problem is that the artists have to create all of
the in-between steps. The maximum number of available steps limits the maximum frame rate
of the animation. For example, for an animated running sequence, suppose that the artist
creates ten steps that are designed to be played back in a one-second time frame. If the game
is running at ten frames a second, then we get one step of the animation per frame. So far, so
good. However, if the game is running at sixty frames a second, then we only get a new frame
of the animation once every six frames. Not so good. We would be better off generating new
animation steps algorithmically to correspond with our frame rate.
This is the basic idea behind interpolated, key-frame animations (or key-frame morphing).
Animations are stored at some predetermined rate (say ten frames per second). Then, at run-
time, the game determines how long the previous frame took and algorithmically creates a new
frame that is an interpolation between two stored frames. This produces animations that always
change smoothly from frame to frame, regardless of the frame rate at which the game is
running. The net effect creates a satisfying experience for the gamer running the game at
fifteen frames a second on a low-end system, as well as for a gamer running the game at sixty
frames a second on a high-end system. If you consider the previous paper and pad animation
example, this process would be the equivalent of adding pages in between existing ones, and
drawing a new figure between the two neighboring ones to smooth out the animation.
This technique can look pretty good onscreen, but newer games use a better technique that
avoids some of the problems inherent in this simple form of "skinning". The biggest problem is
that the polygonal meshes attached to the bones often end up intersecting with one another.
For example, the two meshes composing the upper arm/lower arm can overlap or leave a gap
at the juncture. Aside from the nasty looking creases this creates, there is also a discontinuity
created by the textures between overlapping meshes.
To avoid this overlap problem, developers are starting to use a technique called "single skin
meshes". Basically, instead of having one mesh associated with each bone, they have one mesh
for the whole model, but one or more bones influence each vertex of the mesh. At runtime, the
positions of all bones affecting a vertex are used to calculate the final position of the vertex.
The end result is the removal of the intersection and elimination of the texture discontinuity
problems. See figure 2 for an illustration of this technique.
Calculating and displaying the lighting in a scene requires determining what light sources are in
a scene, which objects they illuminate and how brightly, and how, in turn, those objects cast
shadows and reflect the light.
It's difficult for developers to scale the lighting in a game, since this one element can be fairly
important to the overall playability. Several possibilities for scaling the lighting do exist,
however, and certain techniques can be used to make a game run across the wide range of
platforms available.
Lighting effects can be scaled by choosing different lighting techniques to correspond with the
relative performance of different systems. So, for example, the high-end systems can use
lighting that is entirely dynamic with dynamically cast shadows and moving light sources (such
as rockets). Low-end systems can have the lighting tuned down so that shadows are either
non-existent or less dynamic (perhaps computed only once every other frame). Moving light
sources might be removed or implemented with light maps and changing texture coordinates.
Another possibility is to use some of the scalable geometry techniques described earlier.
Lighting could be calculated for a lower level-of-detail model and then, using calculated and
stored connectivity information, the displayed vertices that aren't part of the lower LOD model
would have their lighting values interpolated from the nearest vertices in the low LOD model.
This technique can apply well to parametric surfaces where the lighting calculations for
generated surface points can be performed less often than the calculation of the surface points.
Since the connectivity information is implicit in a parametric surface, it's easy to do the
interpolation for the in-between vertices.
Shadows are represented in 3D applications in a number of ways. Most 3D games of the past
year or two have resorted to simple shadow maps that resemble a dark circle or a blur
underneath the object. In cases where the object sits on a flat ground plane, sometimes a
simple geometric form of the object is 'squashed' on the ground to represent a shadow. More
recently, developers are moving towards increasingly complex methods of generating shadows,
such as projected textures, dynamic shadow maps, or using hardware features such as stencil
buffers.
Most of these more advanced techniques of generating shadows involve calculations that chew
up processor cycles, so developers need to adapt the techniques to suit low-end systems.
Simpler mechanisms can be used to scale these shadow techniques across different systems.
For example, if the system is incapable of handling a more complex shadow casting method,
(such as using stencil buffers to create shadow volumes) then the application can be designed
to switch to the more basic shadow map approach. Optionally, the application could disable
shadows altogether.
Using 3D hardware accelerators that support stencil buffers, more exact shadow
representations can be created based on the actual shapes of the objects casting the shadows.
This technique involves creating a "shadow volume" for each object. The shadow volume is
itself a 3D object, created by imagining a point at the light source and casting planes that
intersect each silhouette edge of the object. This volume is then rendered through the stencil
buffer to shadow all triangles that intersect the volume. In actual use, this technique produces
impressive results. It can be used to generate self-shadowing of objects. An object, such as a
donut, can be rendered so that it casts a shadow on itself when the light is off to one side. By
using different level of detail models to create the shadow volume, this technique can also be
made scalable, although in some cases artifacts can result as a byproduct of the scaling
process.
You can also create shadows by rendering the object casting the shadow to a texture surface,
and then using that texture as the shadow. This approach can produce more natural, softer
shadow edges, and it can also be scaled in a number of ways. One scalability technique is to
update the shadow's movement at half the frame rate as the rest of the application. This will
minimize computations for use on lower end machines. On the down side, self-shadowing is
much harder to accomplish with this technique than with the stencil buffer technique
Adding distinctive special effects is a great way for developers to distance their from the
competition. The kind of special effects we're talking about here include particle systems for
generating smoke and fire, texture tricks, fog volumes, lens flares, and similar onscreen
pyrotechnics.
Many kinds of effects can be scaled effectively. The simplest way to handle scaling is to just
switch off the effect when running on lower end machines. Resourceful developers are also
investigating more sophisticated ways of scaling these techniques.
Many multi-texture tricks being used by developers to increase realism in applications can be
reduced or switched off to add scalability. Some of these include gloss mapping (to make
objects look partially shiny), dynamic environment mapping (to reflect moving objects and not
just stationary ones, detail texturing (to add detail), and bump mapping (to add surface relief).
A particle system offers a good example of the many ways in which effects can be scaled.
Basically, a particle system is a collection of objects (particles) than behave according to a set
of rules for the given system. They typically all move differently, but follow some global pattern.
This makes them useful for depicting smoke, steam, blowing leaves, and so on. You can scale
down a particle system by altering the granularity of the effect (reducing the number of
particles), but increasing the particle size so that the effect has the same density, but a grainier
appearance). Another way is to simplify the manner in which the individual particles are drawn.
Is each one a full polygonal model, or just a sprite? Also, you can scale the manner in which
particles interact with their environment. Does the application detect collisions between
particles and their environment, or not? Are collisions between the particles themselves
detected? Simplify the behavior and you can effectively scale down the effect for low
performance systems.
Every application has its own unique differences and the best techniques for supporting
scalability will vary from application to application. Regardless of the method the developer
chooses to build scalability into a game, another question remains: how to set the appropriate
level of scaling.
The different approaches for deciding what level of content to use fit into four areas:
z User set
z Set according to hard limits
z Pre-runtime profiling
z Runtime profiling.
Letting the user set the level is probably the most common means of addressing this issue.
Typically, a control panel lets the end user enable or disable features, or select the appropriate
level of content. This approach is certainly the easiest to implement, and it also gives the end
user some control over their experience. On the negative side, it requires that the end user
understand some fairly complex performance tradeoffs and system configuration concerns.
Novice users can easily become frustrated.
Some applications determine default settings according to designated hard limits. An application
surveys the hardware configuration during installation (including CPU, graphics card, memory,
and so on), and selects a predetermined level of content that corresponds with that level of
system performance. This approach somewhat effectively matches content to system
performance, but it has some drawbacks, too. One drawback is that the application may run
into a configuration that the developer never anticipated. Also, factors other than those
collected while determining the system configuration could adversely affect the performance.
For applications that set the level of detail at install time, a hardware configuration change
(such as the user upgrading their CPU or graphics card) would invalidate the settings.
Some applications employ pre-runtime profiling. Either during installation or while the
application is loading, the performance of the system is measured through a series of tests.
Then information returned from the testing is used to set the appropriate level of content. This
approach attempts to tune the level of content to the system's performance, but it has a
possible drawback. The 'test' results are assumed to be representative of the performance
during game playing-- in reality, the load balance could change significantly.
Runtime profiling is considered the Holy Grail of scalability, but it is also the approach most
fraught with problems. The underlying idea is that while the application is running, the current
performance is measured and used in a feedback loop to tune the content level up or down.
This tunes the content to the game play dynamically, but implementing this kind of system can
be very difficult.
We hope this article has pointed out some of the challenges that game developers face when
developing titles to run on the broadest range of PCs. As the gap between the high-end and
low-end machines increases in coming years, developers will have to become even more
resourceful to ensure that games take full advantage of the cutting edge equipment while still
performing well on earlier machines. Now that you know a bit more about what is going on
behind the scenes, you'll begin to notice how some of the effects and features described in this
article influence the game performance of your own computer.
Despite the widespread use of polygonal models for representing 3D geometry, the quest goes
on to find suitable alternatives, particularly since the limitations of polygonal data have become
glaringly obvious to current-generation developers. Because PC developers need to create
content that scales across many levels of processor performance (including both host
processors and 3D graphics accelerators), they're forced to either create multiple models or to
use mesh reduction algorithms for dynamically producing the lower detail models. Creating
multiple models clearly taxes the efforts of 3D artists, who must spend even more time
modeling, manipulating, and animating models composed of large numbers of polygons. As
games become more content intensive (not just in terms of the levels of detail, but more actual
game content), the time required to produce the content grows considerably. Alternatives to
polygonal models offer artists an acceptable means to streamline the creation process and save
time along the way.
This article deals with one of the more promising alternatives to polygonal modeling: NURBS
(Non-Uniform Rational B-Spline) surfaces. First, I'll introduce you to the concepts and
terminology associated with parametric curves and surfaces. Next, I'll describe in detail how to
render NURBS surfaces and discuss some of the difficulties encountered when using NURBS
surfaces in place of polygonal models. Finally, if I've done my job well, this article will whet
your appetite for the exciting types of 3D content that can be created using parametric surfaces
and inspire you to investigate developing this type of content.
Let's start with the basics. Normal "functions," as presented in algebra or calculus (or whatever
mathematics course we've taken recently or not so recently) are defined as the dependent
variable (often y) given as a function of the independent variable (usually x) so that we have an
equation such as: y = 2x^2 - 2x + 1. By plugging in various values for x we can calculate
corresponding values for y. We can create a graph of the function by plotting the corresponding
x and y values on a two-dimensional grid, as shown in Figure 1.
Figure 1. An ordinary function.
Parametric functions also match values of x with values of y, but the difference is that both x
and y are given as functions of a third variable (often represented by u) called the parameter.
So we could have a set of equations expressed as follows:
y = 2u^2 - 2u +1
x=u
These equations produce the same curve that the "implicit" function given above produces. An
additional restriction often added to parametric functions is that the functions are only defined
for a given set of values of the parameter. In our simple example, u could be any real number
but for many sets of equations, the equations will only be considered valid on a range such as 0
<= u <= 1.
Once you understand the nature of a parametric function, we can examine how this pertains to
parametric curves and surfaces. In simplest terms, a parametric curve is the plot of a set of
parametric functions over the valid parameter range. Our previous example has two functions
(one for x and one for y) that when plotted for 0 <= u <= 1 create the graph in Figure 1. We
can easily add a third function for z (such as: z = 2u) and then we have a set of parametric
functions that create a curve in 3-space.
That's all well and good, you might be thinking, but how do these parametric functions get
chosen in a way that is useful to software developers? Essentially, typical "parametric" curves
and surfaces are more than just a set (or sets) of parametric functions. Let's take the earlier
description one step further so that we can see how parametric curves and surfaces originate.
Consider the set of 2-dimensional points in Figure 2 as well as the linear connection of these
points shown in Figure 3. We can think of these points as representing a linear approximation of
a curve that starts at the first point (0,1) and ends at the last point (3,1). Another way of
looking at this: as x goes from 0 to 1, y goes from 1 to 2. As x goes from 1 to 2, y stays at 2,
and as x goes from 2 to 3, y goes from 2 down to 1. In mathematical terms, the "curve" in
Figure 3 can be defined as a "blending" of four points: P0 = (0,1), P1 = (1,2), P2 = (2,2), and
P3 = (3,1). The points are blended by a set of functions defined as follows:
This gives us a two-dimensional curve (C) defined as a linear combination of four two-
dimensional points (P0,P1,P2,P3) and four scalar parametric blending functions (F0,F1,F2,F3)
valid in the closed interval [0,3].
Are you excited, yet? Probably not, but I am because here's the kicker: by creatively choosing
these blending functions, we can change the look and smoothness of the curve both in terms of
visual appearance and mathematical continuity.
You may still be wondering how we'll come up with these blending functions more easily. Well,
part of the secret lies in the concept of a knot vector. A knot vector is simply a vector (which is
a list of one or more real numbers) that describes the "knots" of the curve. Think of a knot as a
point where the blending functions change. In the previous example, the blending functions
change at 0 (where they start), 1, and 2. This is apparent from the conditions on the functions
(such as 1 <= u < 2). An example of a knot vector would be: {0, 1, 2, 3}. We'll call this knot
vector U and denote each of the terms in it as u0,u1,u2,u3 so that u0=0, u1=1, u2=2, and
u3=3.
1. The values must be non-decreasing. This means that ui+1 >= ui for all i. This also means
that values can be repeated so that {0,1,1,2,3} is a valid knot vector.
2. The spacing of the "knots" (that is, the difference between successive knot values ui and
ui+1) can either be "uniform" (the same for all uI and ui+1 pairs) or "non-uniform". We'll
talk about this later in the article.
Now we're going to define a powerful set of parametric functions called the b-spline basis
functions (the b in b-spline stands for "basis" so this term is kind of redundant). These
equations are defined for a given knot vector U = {u0, u1, …, un} as given in Equation 1:
Whoa, that's scary! Let's take a close look at it to see what makes it useful. The p subscript in
the second equation is the degree of the function (points are zero'th degree, lines are first
degree, and so on). The first equation expresses that for zero'th degree curves, the function is
either constant zero or constant one depending on the parameter, u, and where it falls in the
knot vector. Looking at this pictorially for the knot vector U = {0,1,2} and B0,0, B1,0, and B2,0
we get the plots shown in Figure 4.
For degrees other than zero, we must recursively calculate the value of the function using a
linear combination of the functions that are one degree less than the degree for which we're
calculating. For first degree functions, we use a linear combination of the zero'th degree
functions. For second degree functions, we use a linear combination of the first degree functions
(which are also defined as a linear combination of the zero'th degree functions), and so on. As
an example, for the knot vector U = {0,1,2, 3} we produce the plots shown in Figure 5 for
B0,1, B1,1, B2,1, and B3,1.
Interestingly enough, with the four control points, P0, P1, P2, and P3 defined in our previous
example, we can now represent the curve, C from Figure 3, as a parametric curve by the
equation:
C(u) = B0,1(u) * P0 + B1,1(u) * P1 + B2,1(u) * P2 + B3,1(u) * P3 with knot vector U =
{0,1,2,3}.
To verify that this approach works, pick a value for u, say 1.5. Looking at the plots in Figure 5
we can see that:
B0,1(1.5) = 0
B1,1(1.5) = 0.5
B2,1(1.5) = 0.5
B3,1(1.5) = 0
CX
= B0,1(1.5) x P0,X + B1,1(1.5) x P1,X + B2,1(1.5) x P2,X + B3,1(1.5) x P3,X
(1.5)
= 0 * 0 + 0.5*1 + 0.5*2 + 0*0
= 1.5
CY
= B0,1(1.5) * P0,Y + B1,1(1.5) * P1,Y + B2,1(1.5) * P2,Y + B3,1(1.5) * P3,Y
(1.5)
= 0 * 0 + 0.5*2 + 0.5*2 + 0*0
=2
We've covered a lot of ground and still haven't even looked at parametric surfaces yet. That's
okay because by now you should have a decent understanding of the nature of parametric
surfaces. We know that a parametric function is a set of equations that produce one or more
values for a given parameter. In our examples, we produced x and y values and could easily
have produce z values to generate points in 2-space or 3-space. I've also shown how several
parametric functions can be used to "blend" points in 2-space (again, blending in 3-space would
be a trivial extension of this process). We also learned what a knot vector is and how knot
vectors can be used together with the b-spline basis functions to create some interesting
"blending" functions.
Parametric Surfaces
Now that we know how to describe parametric curves using a set of control points (which is
what P0, P1, P2, and P3 were in the previous example), we can begin to understand
parametric surfaces. The control points that we're going to use for parametric surfaces will be
3-dimensional points. Let's construct an example using the points shown in Figure 6.
Starting with16 points labeled P0,0 through P3,3, we
want to "blend" these points together to form a surface.
This process is actually quite easy. To generate a
surface point that we'll call S, start with two knot
vectors, U and V, to create two sets of b-spline basis
functions, Bi,p(u) and Bj,q(v). Here p and q tell us the
degrees of the surface (for example: linear, quadratic,
cubic) in each direction. Now, we can define the
function for the surface that corresponds to the function
Figure 6. A parametric surface. for a curve shown in Equation 3:
Simple enough? Let's look at it in greater depth just to be sure that the process is clear. To
calculate a surface point, S(u,v), we loop over all the control points (with the two summation
signs in the equation) and scale each control point, Pi,j, by the appropriate blending functions
evaluated at u and v. Keep in mind that for a surface with many control points, some of the
blending functions will be equal to zero over large regions of the surface. In particular, for a
surface of degree n x m, at most (n+1)*(m+1) blending functions will be non-zero at a given
(u,v) parameter value.
Before getting to the sample code, let's cover one more thing. The basis functions that we've
described have an interesting property (actually it's by design). If you expand them for a given
degree, n, and a fixed knot vector, you end up with a polynomial equation of the form: A0 +
A1u + A2u^2 + A3u^3 + … + Anu^n where AI are coefficients that are determined exclusively
by the knot vector and degree. Polynomials are good functions used for approximating (or, in
some cases, representing exactly) other functions. However, there are some three dimensional
surfaces that can't easily be approximated using polynomials as bases; specifically, the conics:
spheres, cylinders, cones, and so on. To more easily and accurately represent these surfaces,
you can use a ratio of polynomials. For two polynomial equations, F and G, a rational
polynomial R would be defined by:
Using the b-spline functions from Equation 1, we can define a "rational" parametric surface by
adding to the control points a fourth component (the first three are x, y, and z) that "weights"
each control point. We'll call the fourth component w. In this manner, the equation for the
surface becomes:
In case you were wondering, this is the equation for a rational b-spline surface. If the knot
vector used for the basis functions is a non-uniform knot vector, then this is the equation for a
non-uniform rational b-spline surface: a NURBS surface! Equation 4 is the equation for a
generalized parametric surface. Other common parametric surfaces are just subsets of these
surfaces. Specifically, a non-rational, uniform or non-uniform, b-spline surface is one where the
weights, wi,j, are all equal to 1. This causes the division to accomplish nothing (and hence we
don't have to evaluate the denominator at all). Also, you may have heard of a Bézier surface
which is a non-rational b-spline surface with a uniform knot vector that is all zeros followed by
all ones. So, for a 3rd degree Bézier surface, the knot vector would be U = {0,0,0,0,1,1,1,1}.
Rational parametric surfaces offer one more nicety that isn't available for non-rational surfaces.
Any affine transformation (translation, rotation, scale, shear, and perspective projection) can be
applied to the control points of a rational parametric surface and then the surface points
generated in the transformed space will be correct. This means that if you have a small number
of control points then you can transform the control points and generate a large number of
surface points without having to transform all the generated surface points. Using non-rational
surfaces, you would at least have to perform the projection transformation of the generated
surface points.
At this point, we can take Equation 4 and write some code to do a straight forward
implementation of this. This would not be too difficult, but there are some optimizations that we
can make first so that our implementation will perform better and after all, it's real-time
performance that we want. First, let's discuss "tessellation". Tessellation is the process of taking
the continuous, mathematical equation of a surface and approximating it with polygons (we'll
use triangles). This process can be accomplished in a number of ways with the potential for
vastly different visual results.
For simplicity, we're going to use what's called uniform tessellation. Uniform tessellation means
we step equally between the minimum and maximum values for the parameters over which the
surface is valid. For example, assume that the surface is valid for the ranges u E [0,3] and v E
[2,3]. What we can do is divide these into some number of subdivisions and then just loop over
these values calculating surface points that will be used as vertices of triangles. If we decide to
use 20 subdivisions, we would calculate S(u,v) at u=0, u=0.15, u=0.30, …, u=3 for each v=2,
v=2.05, v=2.10, v=2.15, …, v=3. So, we'd end up generating 441 points (21 times 21 because
we include the end points) that we could then connect into triangles and render using a 3D API,
such as OpenGL* or Direct3D*.
To speed up the calculation of S(u,v), we can calculate Bi,p(u) and Bj,q(v) at the subdivision
points and store these in an array. This calculation can be performed once, so that it will not
have to be performed in the inner loop of calculating surface points. Instead, a lookup of the
pre-computed values and a multiplication is the only task that would be required. If at some
point we change the number of subdivisions we want, we can just recalculate the stored arrays
of basis functions evaluated at the new subdivisions.
So now that we have a general idea of a way to tessellate a NURBS surface (or any other
parametric surface, for that matter), what else do we need? For one, we need a way to
generate surface normals so that we can let the 3D API (Direct3D* in the sample code) do
lighting calculations for us. How do we generate these? Well, remember those Calculus classes
that we all loved? One of the things we learned is that the derivative of a function is the
instantaneous slope of the line tangent to the function at the point where the derivative and
function are evaluated. By creating two tangent lines (one in the u and one in the v parameter)
we can take a cross product and wind up with a surface normal. Simple enough, you say, but
what's the derivative of the function S(u,v)?
Well, there are two partial derivatives: one with respect to u andone with respect to v, and
they're ugly! Using the chain-rule:
And, not only is that ugly, we don't really know how to take the derivatives of Bi,p(u) and Bj,q
(v). It's possible to take a derivative of Bi,p(u) (and Bj,q(v) ) from it's definition, but there's an
easier way. It's possible to come up with a set of equations for calculating the coefficients of the
polynomial equation that Bi,p(u) is equivalent to. Then, taking the derivative of BI,p(u) is as
simple as multiplying powers by coefficients and reducing the powers by one (if you recall d
(Ax^n + Bx^m)/dx = nAx^(n-1) + mBx^(m-1)). You still have to use Equation 5 to compute
the derivatives of S(u,v) but it's really not that bad - you're going to be performing the
computation of some of the terms any way, and the ones with the derivatives are calculated the
same way as the non-derivative terms. We need to be able to calculate the coefficients of the
b-spline basis functions when they're represented as follows:
Using a lot of paper and a bit of head scratching, I derived the following formulas to compute
the coefficients, Ci,p,k(u).
This seems complex, but unless the knot vector changes, you don't have to re-compute these
coefficients after the first time. Also note that Ci,p,k is only dependent on u for the knot span
that u is in not on u itself, so we can just evaluate the Ci,p,k for each knot span and store those
values. Now we can write the derivative of Bi,p(u) as:
Sample Code
At this point we know what we need to know to talk about the sample code you can download
and how to implement this fun stuff. First, everything in the sample code is written in C++ and
spread across many files of which mainly two are specific to this article: DRGNURBSSurface.h
and DRGNURBSSurface.cpp. Actually, you'll also dive into NURBSSample.cpp if you want to
play with the surface control points and knot vectors. DRGNURBSSurface.h contains a class
definition for a class called CDRGNURBSSurface (for the curious, C is for "class", DRG is for
"Developer Relations Group" which is what the group I'm in at Intel used to be called). The
methods of this class of interest to us are Init(), ComputeBasisCoefficients(),
ComputeCoefficient(), SetTessellations(), EvaluateBasisFunctions(), TessellateSurface(), and
TessellateSurfaceSSE().
Going through these in order, Init() is called to initialize a newly created CDRGNURBSSurface
object. The function takes a pointer to a CDRGWrapper class that is part of the framework we
wrote for getting at the Direct3D* API. Init() also takes two surface degrees, u and v, and the
number of control points in the u and v directions. It takes an array of Point4D structures that
contain the weighted control points (x, y, z, and w) stored in u-major order (this means that v
values are consecutive in the array). It takes two float arrays that contain the u knots and the v
knots. Finally, it takes two optional values that specify the number of tessellations in the u and
v directions of the surface. Init() does some calculations to determine how many knots are in
the knot vectors and then allocates memory to store some of the information needed to render
the surface. Finally, Init() makes a local copy of the incoming data (control points and knots)
and then calls ComputeBasisCoefficients().
ComputeBasisCoefficients() calls ComputeBasisCoefficient() which uses the formulas from
Equation 6 to compute the coefficients of the polynomials formed from the knot vectors and the
degrees of the surface. ComputeBasisCoefficient() calls itself recursively due to the definitions
in Equation 6. The coefficients are stored in arrays to be used by EvaluateBasisFunctions().
Because the Ci,p,k(u) are only dependent on the knot span that u belongs in,
ComputeBasisCoefficient() takes as an argument this knot span (referred to as an "interval" in
the code) rather than the actual value of u.
At this point, everything is initialized for tessellating a NURBS surface. Now, at each frame that
the sample application renders, the Render() method of the CDRGNURBSSurface object is called
and in turns calls TessellateSurface() or TessellateSurfaceSSE() depending on whether or not
we've told the object to use the Streaming SIMD Extensions of an Intel Pentium III processor.
Real-Time Optimizations
We already talked about some optimizations that can be done to evaluate NURBS surfaces more
quickly. The first, which is used by the sample code, is to use uniform tessellation and pre-
evaluate the basis functions and their derivatives at the tessellation points. We also mentioned
the possibility of transforming surface control points into projected space and doing our surface
tessellation in that space. While this works, lighting can be difficult (or impossible) if you use
anything other than directional lights because distance is not preserved in perspective projected
space. If you're using light maps in your engine I would highly recommend transforming control
points and generating vertices in projected space. You can modify TessellateSurface() to do the
divide by homogeneous w and viewport scaling to generate vertices in screen space.
To keep memory requirements minimal, we render the surface by generating two rows of
surface points and then passing a triangle strip to the API (Direct3D* in our case). If a surface
didn't need to be re-tessellated at every frame, then we could generate all the surface points
and store these in an array. Depending on the application, it may still be quicker to tessellate
the surface at every frame rather than having to fetch the generated vertices from memory
(with corresponding cache misses). You'll need to experiment with your particular application to
see what works best.
Aside from the algorithmic optimizations just discussed, we can achieve better performance by
using the new Streaming SIMD Extensions supported by Intel's Pentium III processor. These
extensions allow us to do mathematical operations on four floating point values at one time (for
more information on the Streaming SIMD Extensions of the Intel Pentium III processor, visit
http://developer.intel.com/design/pentiumiii/). Since for NURBS surfaces we're dealing with
four coordinates (x, y, z, and w) we can do the same operations to all four at once.
TessellateSurfaceSSE() uses intrinsic functions provided by the Intel C/C++ Compiler version
4.0 to evaluate all four coordinates of a NURBS surface point at once.
Other optimizations are possible depending on the quality vs. speed tradeoffs acceptable by a
particular application. For example, one could choose to generate normals only every other
surface point (or less frequently) and then linearly interpolate normals in between.
I should mention a few last things about the sample code contained in the download. The
sample requires the Microsoft DirectX 7 SDK to build or run and was written using C++ and
built using Microsoft Visual C++ 6.0. If you don't have the Intel C/C++ compiler version 4.0
included with version 4 of the Intel VTune product, you'll need to change a line in
DRGNURBSSurface.h. The line reads "#define SUPPORT_PENTIUM_III 1" and should be
changed to "#define SUPPORT_PENTIUM_III 0". You can then rebuild everything using the
Microsoft compiler (or other C++ compiler) and get to see the code working. You won't be able
to enable the tessellation routine that uses the Streaming SIMD Extensions of the Intel Pentium
III processor, though.
While running the application, pressing 'H' will bring up a help screen of available keys. Most
are self explanatory. One worth mentioning is the 'M' key that causes the display to switch
between two different "Objects". The objects are either:
You'll notice when viewing the nine surfaces that there are hard creases between the surfaces.
This doesn't happen with the single surface. When changing the tessellation level, for the single
NURBS surface, there are actually 9 times as many points generated as what the number
indicates. This is done to keep a somewhat consistent look between the shapes of the two
different "Objects".
I've discussed the math behind parametric surfaces and the basics of rendering them and
hopefully made them seem appealing as an alternative to polygonal models. What I haven't
addressed are some of the problems that are unique to parametric surfaces and some of the
trickier aspects of using parametric surfaces in place of polygonal models.
2. Cracking - When two parametric surfaces meet at an edge (or one parametric surface
meets a polygonal surface) it's possible for a crack to appear between the surfaces if their
degrees of tessellation differ (or it they're just different sizes). This problem can be solved
on a per application basis by adding connectivity information to the surfaces. It's not
trivial to fix, but it's not impossible.
3. Collision detection - If you're doing collision detection in your application, you have
several choices with parametric surfaces:
{ Store all the generated triangles and do collision detection on these - while more
accurate, it's more memory intensive as well as computationally intensive
{ Depending on what types of objects may be colliding, you can solve the parametric
surface equations with equations representing the other objects (even lines are
difficult, though) and then just plug-and-chug to find collision points
{ Use a combination of (a) and (b) by starting with (a) and then refining the surface
to triangles to determine an exact hit.
4. Clipping - For surfaces that are partially within the viewing frustum, it can be difficult to
clip prior to generating triangles. The problem is that you can't just clip control points
because doing so would make the tessellation of the surface difficult to impossible. The
easiest solution is to just generate triangles and then clip the triangles - the downside to
this is the possibility of generating many more triangles than needed.
5. Back-surface Culling - Aside from clipping, it is also difficult to easily cull back-facing
surfaces or portions of surfaces for similar reasons to the clipping problem. For example,
a sphere can be defined with one surface but only half of the sphere is ever visible at one
time. It would be nice to be able to cull the back-facing portion of the sphere before
tessellation, but this is difficult to do.
6. Tessellation - Although a uniform tessellation algorithm is easy to implement and can run
fast, in some instances other algorithms may provide better performance/quality.
Surfaces that have very curvy areas as well as very flat areas may be better tessellated
with a non-uniform tessellation algorithm.
7. Non-local refinement not supported - When refining a surface (i.e. adding detail), you
must add control points in complete rows and columns so the control mesh remains a
regular grid of points. This causes excessive control points to be added just to add detail
in a small, localized region of a surface. Note that this is not an implementation issue, but
rather an issue with NURBS surfaces (and other parametric surfaces).
8. Degenerate Normals - Because it's possible to have control points that are at the same
location, it's possible for the derivatives of the surface to vanish (i.e. go to zero). This
causes the calculation of surface normals to fail. To solve this, it is necessary to look at
surrounding points and derivatives if one of the tangents gets too close to zero.
Conclusion
We've covered a lot of information in this article. We've been introduced to parametric curves
and surfaces and should have a decent understanding of the concepts behind them. We've
learned what's involved in rendering parametric surfaces and can see how the data
requirements are smaller than the polygonal models that can be generated. And we should now
have an idea how to implement some of the creative types of 3D content we talked about in the
introduction.
Given that the field of study of parametric surfaces is enormous, we've only lightly touched the
surface (no pun intended) of what's possible. Experimenting with parametric surfaces is
exciting. I encourage you to check out the sample code and get a feel for how you can
incorporate NURBS surface rendering into your 3D engine today.
Piegl, Les and Tiller, Wayne. The NURBS Book, 2nd Edition, Berlin, Germany: Springer-Verlag,
1996.
Foley, j., van Dam, A., Feiner, S., and Hughes, J. Computer Graphics: Principles and Practice,
Reading, MA: Addison-Wesley, 1990.
Developing for the current generation PC platform rewards software developers working in 3D
with both blessings and curses. On the positive side, PCs are delivering rocket-fueled levels of
performance, allowing developers to accomplish amazing feats on inexpensive consumer
machines. The downside is that developers have to code their applications to scale across
hardware with an increasingly wide range of performance differences and to accommodate
significantly different hardware feature sets.
The Microsoft DirectX7 API offers the opportunity for programmers to tap into some fairly
remarkable graphics capabilities, including the ability to render primitives to texture surfaces.
This article explores some of these possibilities and explains the techniques that can be used to
take full advantage of this capability. To do this successfully in a real application, you need to
exploit the latest advances in graphics chips for effects processing, while gracefully handling
systems that lack these advanced features, by either scaling down the effect, or performing the
effect at a slower speed.
Detecting support for hardware that can render to texture surfaces is necessary if you want to
know when to work around hardware that isn't capable of rendering to textures. To this end,
this article introduces an approach to detecting support for rendering to texture surfaces and
how to achieve similar functionality on hardware that doesn't directly support it. I will also be
presenting a number of examples of special effects that can be achieved using this capability.
Rendering to texture surfaces with today's 3D hardware provides the opportunity for some
dazzling effects, including reflection/refraction, rendering soft-edged shadows, mip map
generation, motion blur, and TV-style transitions, and much more.
There are several ways to render to textures on current generation 3D hardware. Applications
aiming to exploit this capability will need to detect which method can be used on the end user's
system, and provide codepaths to handle the different methods. (Not all paths need be
handled. An application could choose to only support methods A and D, as I have done in my
example. The executable and source for this can be found in TESTRTT.ZIP)
The first method offers the fastest and most efficient approach, but it is not supported by all
hardware. The last of the methods listed will work on all systems, but places the rendering
burden on the CPU, creating processor overhead when we would prefer to have the graphics
hardware do the work. Additionally, Methods B and D require extra video memory to be
allocated.
Under DirectX6, some graphics hardware could render to texture surfaces, but application
developers were unable to take advantage of this feature. Because of the way the surface
creation flags were implemented and interpreted by certain hardware vendors and by some
application developers, a Catch 22 was created. Driver writers implemented the flags
incorrectly, but fixing them would have risked breaking currently functional applications already
out in the market.
The solution was to make a fresh start with DirectX7, and this time, interpret the flags correctly
when used with the new interfaces. Consequently, in order to make use of this capability, and
to detect its presence, applications must be using the DirectX7 interfaces (IDirectDraw7,
IDirect3Ddevice7, etc).
When using the new interfaces, applications can test for the render-to-texture capability in two
steps. The first way is by creating a DirectDraw surface that is flagged as being both a render
target surface and a texture surface. The second way is by attempting to set this surface as the
render target for the Direct3Ddevice. If either of these steps fail, the application must fall back
to one of the alternate solutions discussed in table one.
At the time of this article's authoring, there were still issues when attempting to render to
texture surfaces with legacy DirectX6 display drivers running on the DX7 runtime DLLs. If an
application wants to render to texture surfaces and will ship while a significant number of DX6
display drivers are in end users systems, the application will have to resort to cruder methods
of checking for the capability or lack thereof. Examples of such crude methods could include
rendering to a texture and then locking it and comparing pixels to the expected result, or worst
case, checking device names. In my TestRTT sample, I created a black texture, set it as the
render target, cleared it to white, and then rendered some primitives to it. If after doing so it
contains only black pixels, I know the render-to-texture attempt has failed, and resort to an
alternative method. I do this test at start up, and when the rendering device is changed.
The next few sections explain how each of these methods is implemented. Once the methods
have been explained, this article presents a method for detecting which method is supported.
A number of the popular consumer 3D graphics accelerators released in the past year or two
support rendering directly to texture surfaces. When available, this is generally the fastest
method for implementing this technique. No extra copying is necessary and redundant surfaces
aren't needed.
z Create a DirectDraw surface with these properties: the surface should support rendering
and be usable as a texture. To do this, specify two flags, DDSCAPS_3DDEVICE and
DDSCAPS_TEXTURE, when calling DirectDraw's CreateSurface function. The 3DDEVICE
flag tells DirectDraw the application would like the surface to be one the device can
render to, and the TEXTURE flag tells DirectDraw the application would also like to use it
as a texture.
Note that if you are using Direct3D's texture manager, textures that are going to be
rendered to cannot use the texture manager. They must be created specifying the
memory type (usually localvidmem, as few cards can render to AGP surfaces). It is best
to do this before invoking the texture manger, so that the texture manager will know how
much memory it has left over to work with.
z If the backbuffer has a Z buffer attached to it, the surface for rendering to texture must
also have a Z buffer associated with it.
z If the creation of the surface fails, it could be because the hardware does not support
rendering to textures. First, ensure that the failure is not due to any of the usual suspects
(lack of video memory, unsupported size or format, etc). If you confirm that the failure is
due to the hardware not supporting rendering to texture surfaces, the application must
fall back to one of the other mechanisms.
z If the creation of the surface succeeds, the next step is to render to the texture surface.
This is done by calling the Direct3DDevice7->SetRenderTarget() method, to point the
rendering device to the texture surface. If the SetRenderTarget call fails, indicating that
the device doesn't support rendering to texture surfaces, the application will need to fall
back to one of the other methods.
z At this point, rendering triangles is performed as usual, ensuring that there is one
BeginScene/EndScene pair per render target, per frame.
This approach creates a DirectDraw surface which is identical in size to the texture surface, but
is created with the DDSCAPS_3DDEVICE flag (but without the DDSCAPS_TEXTURE flag).. After
that, the steps are similar, except that the SetRenderTarget() method is used to point the
device to the intermediate surface. Then, a blit must be done, following the EndScene(), to
copy the rendered scene to the texture surface.
This will work on some hardware that won't support the previous method because some
hardware, in order to increase performance, rearranges the data in texture surfaces into a
format that is more friendly to the graphics chip's texel cache. This is often referred to as a
swizzled texture. The rendering device cannot render triangles to this surface type, but it can
handle blitting from one type to another.
Rendering to the back buffer and then blitting a subsection to a texture surface.
This method uses less memory than the last method, but it can require an extra clearing of the
back buffer. All rendering is done to the back buffer, but there are two BeginScene/EndScene
pairs per frame, one for the scene to be rendered to the texture and one for the scene to be
rendered to the back buffer.
Software rendering to a system memory surface and blitting that to the texture
surface.
This approach, while fairly straight forward, should be avoided for obvious reasons (in case it's
not obvious, software rendering is slow. We'd prefer to use the 3D hardware to do it). If you
have to rely on this method, consider ways to scale back the quality of the effect, such as
reducing the texture surface's resolution.
An example of handling some of these fallback methods can be found in The TESTRTT sample
code
Now that we've seen how to render to texture surfaces and how to use less desirable methods
to gracefully handle systems that cannot, let's examine a few of the effects that we can produce
using this capability.
Mirrors
One of the first uses that springs to mind is mirror reflections, where objects are texture
mapped with a reflection of the scene in front of them. This effect requires rendering the
scene's geometry from the point of view of a reflected camera, using a rectangle around the
mirror (which could be the mirror itself if it is rectangular). The new mirror view frustum is
sheared based on the angle between the mirror's normal vector and the vector from the viewer
to the mirror position (see Figure 2). The shearing lets the reflection point in the right direction,
while letting the mirror plane act as the front clipping plane of the mirror view frustum.
Of course, mirrors can be done by just projecting geometry against a plane. However, if the
mirror is a complex shape, there is a lot more clipping work involved. Also, there are
advantages in the area of scalability we will discuss later in this article.
Figure 1. Mirror done by rendering to texture
surface.
The executable and source code used to generate the above example are provided in the
FLATMIRROR directory of the sample code.
A logical extension of the previous effect would be to render environment maps on the fly. In
this way the environment maps would not only be a representation of the distant scene, but
they could be rendered on the fly to reflect nearby objects. (Typically, environment maps are
typically of sky and distant mountains, so that they can remain relatively constant within the
scene).
Often, environment maps are represented as a 'sphereized' image; one that is distorted to look
as though it were captured with an extremely wide-angle lens (See figure 3). You can
approximate this effect by using a view frustum with an extremely wide field of view, placing
the camera at the center of the object intended to receive the environment map. Because it is
impossible to use a field of view of 180 degrees, by necessity, we are going to have to limit the
field of view to something less than that (our example uses 120 degrees). Additionally, there is
the issue of mapping polar coords onto a rectangular texture. For most environment mapping
uses, the reflection is subtle enough that the effect can work quite well.
As well, a hybrid of static and dynamic environment maps may make sense. For example, it
may make sense to initialize the background of the dynamic environment map with a static
environment map texture, and then render the reflection of nearby objects on top of the
background.
Figure 2. A sphereized bitmap for use in an
environment mapppp.
Once the dynamic environment map has been rendered, texture coords for the object to receive
the environment map are calculated as with other hemispherical environment map cases. For
every vertex in the object, the vertex normal is transformed into camera space, and the X and
Y components of the reflected camera vector then become the texture coords. An example of
this can be seen in figure 4, and the source code and executable can be found in SPHEREMAP
directory of the sample code.
The reason that polar-to-rectangular mapping is a problem is that while we are adequately
(while not completely correctly) calculating the UV coordinates for each vertex, the UV
coordinates for intermediate pixels are incorrect. That is to say that while we go across the
surface of the sphere, the reflected view vectors generate UV coordinates that fall away
exponentially. However, the graphics hardware only does a linear (not exponential)
interpolation of the UV coordinates between vertices. The extent to which this problem shows
up will depend on how highly tessellated the model is. A model with a vertex per pixel will
appear perfect, but the texture will begin to 'jiggle' slightly as the triangles get larger. One way
around this may be to do another render-to-texture step that approximates the 'sphereize' filter
that many photo editing packages do, using a highly tessellated mesh.
In his March 1999 Game Developer Magazine article entitled "Real-time Shadow Casting,"
Hubert Nguyen presents an approach to rendering shadows into the frame buffer, and then
copying them to a texture surface. While this technique is a fitting example of rendering to
texture, it uses one of the fallback methods mentioned earlier in this article (Nguyen
implemented his method using a 3Dfx Voodoo card, which can't render to texture surfaces).
z A 'shadow texture' is created as one that can be used to render to, and it is cleared to
white.
z From the light's point of view, the objects that are to cast shadows are rendered. The
rendering is done using flat shading and the scene's ambient light color.
z The objects that are to be shadowed are transformed to the point of view of the light.
Their 'screen coordinates' in this space are now used as their texture coordinates for the
next step.
z The scene is rendered from the point of view of the regular camera, using two passes (or
two texture stages on multi-pass hardware). The first stage uses the object's regular
material and color. The second stage uses the newly created shadow texture, using the
calculated texture coordinates. The second pass (or second stage) is only done on those
objects that are receivers of the shadow.
z The blend mode used for the latter pass is modulated. This leaves unshadowed areas
alone, and shadowed areas modulated with the scene's ambient light color. Also, the
latter texture stage must be set to clamp, to ensure that objects far outside the range of
the shadow do not get shadowed.
Figure 5 is a screenshot of this technique in action. The image in the upper left corner is the
shadow texture (i.e. the view of the object casting the shadow, from the point of view of the
light). The source code and executable are available in the SHADOWTEX directory of the sample
code.
One use for rendering to textures is to create mip-map chains. To accomplish this, set up a
chain of surfaces, copy the source texture to the first, and then loop through to the smallest of
the chain. At each iteration of the loop, the render target is the next smallest in the chain. A
rectangle is rendered over it using the previous one in the chain as the texture, and bilinear
filtering helps create the mip map. While this approach doesn't offer any great advantage over
storing them on the hard drive and loading them at start time, it may be useful for creating
mip-maps of textures created using one of the previously mentioned techniques, or perhaps
other procedural textures.
When transitioning from one scene to the next, it would be possible to keep the last frame from
a scene by rendering it to a texture, and then use it when transitioning to the next scene in a
style similar to those scene on TV, or in video editing applications. Typical transitions are ones
like barn-door, vertical blind, page turn, etc.
I am certain many other techniques exist. For example, in the screenshot in Figure 6, I tried
some feedback buffer effects by rendering to one texture, and then using that as the
background texture while rendering to a second texture, and repeating the process, swapping
the pointers to them both. By drawing some random pixels along the bottom of the texture, I
tried creating a 'fire effect', and by drawing the objects in my scene with a noise texture, I
created some 'smoke trails'. The effect was propagated upwards by slightly offsetting the UV
coordinates of the triangles used to draw the background on each texture. The code and
executable for this demo can be found in the FEEDBACK directory of the sample code.
One of the key problems facing PC game developers is scalability. How do you add a particular
feature on systems with the performance to handle it, and scale back the content or the effect
on lower performance systems? There are a few ways in which the render-to-texture techniques
can be scaled.
One technique is to use a lower resolution texture for systems where the amount of free video
memory or the fill rate of the accelerator is a concern. The resulting difference in quality is
typically within the acceptable range for many of the effects. The ShadowTex and FlatMirror
demos allow you to do this to see the results.
In some cases, the dynamically rendered texture can be updated less frequently. For example,
if a camera is panning across the room and the application is running at 60fps, it may not be
too obvious if the scene in the mirror in the back of the room is only updating at 30fps. In other
cases where there are very large differences in the scenery from frame to frame, the artifacts
may be more glaringly obvious. Both the FlatMirror and SphereMap demos allow you see the
results of doing this.
You can also use a lower LOD version of the model to scale the effect down on lower end
systems. Often applications have multiple LODs resident in memory, allowing the application to
switch between them as needed. Generating an environment map or shadow with one of these
may still produce a reasonably good quality effect, while reducing the amount of geometry work
required to render the texture.
Potential Enhancements
While a number of techniques have been discussed here, as well as a number of areas for
applying scalability, some areas for potential enhancements haven't yet been discussed.
Some of the newer rendering capabilities exposed in DirectX7, combined with the render-to-
texture technique, offer one of the more exciting areas for exploration. One example of this is
the DX7 support for Envbump bump mapping, which could be used on dynamically rendered
textures to do dynamic bump mapping effects (see figure 5, which first appeared in my June
1999 Gamasutra article on Bump Mapping. The code and executable for this demo can be found
in the WATER directory of the sample code). This could also be used to do heat shimmer or
refraction effects, as in movies like Predator or The Matrix. Another promising example is the
Cubic Environment Mapping feature that will be supported by some upcoming graphics
hardware. Used with dynamically rendered textures, this feature could be used to perform
pseudo-ray-tracing techniques. (The cubic environment-mapping example on the DirectX7
demonstrates this).
Other areas that offer potential include using procedural effects on the textures after rendering
them, using alpha blending on the rendered textures over multiple frames to achieve effects
such as motion blur and depth of field, and using DirectX texture transforms to do procedural
effects.
Wrapping It Up
Being able to render to textures adds one more valuable tool to the developer's arsenal,
offering many exciting possibilities for 3D applications and extending the range of available
effects to new limits. Now that this capability is supported by a substantial installed hardware
base, and the DirectX7 API exposes the feature, developers can start actively using it in their
applications. We hope the techniques presented in this article will guide developers along the
path to taking advantage of this approach to rendering.