Computer Game Programming
Computer Game Programming
Game Programming1
David M. Mount
Department of Computer Science
University of Maryland
Spring 2017
1 Copyright, David M. Mount, 2017, Dept. of Computer Science, University of Maryland, College Park, MD, 20742. These
lecture notes were prepared by David Mount for the course CMSC 425, Game Programming, at the University of Maryland.
Permission to use, copy, modify, and distribute these notes for educational purposes and without fee is hereby granted, provided
that this copyright notice appear in all copies.
The Early Days: Computer games are as old as computers. One of the oldest examples was from
1958. It was a Pong-like game called Tennis for Two, which was developed by William Higin-
botham of Brookhaven National Lab in 1958 and was played on an oscilloscope. Another example
was a game developed in 1961 by a group of students at MIT, called Spacewar. It was programmed
on the PDP 1. When the Arpanet (forerunner to the Internet) was developed, this program was
available disseminated to a number of universities (where grad students like me would play them
when their supervisors weren’t watching). It is credited as the first influential computer game,
but the influence was confined to academic and research institutions, not the general public.
Prior to the 1970s arcade games were mechanical, with the most popular being the many varieties
of pinball. The computer game industry could be said to start with the first arcade computer
game, called Computer Space. It was developed in 1971 by Nolan Bushnell and Ted Dabney,
who founded Atari. One of Atari’s most popular arcade games was Pong. There was a boom of
2-dimensional arcade games from the mid 1970s to the early 1980s, which was led by well-known
games such as Asteroids, Space Invaders, Galaxian, and numerous variations In 1980, a popular
game in Japan, called Puck Man, was purchased by Bally for US distribution. They recognized
the enticement for vandalism by changing “Puck” into another well known 4-letter word, so they
changed the name to “Pac-Man.” They game became extremely successful.
The 70’s and 80’s: During the 1970s, computer games came into people’s homes with the develop-
ment of the Atari game console. Its popularity is remarkable, given that the early technology of
the day supported nothing more sophisticated than Pong. One of the popular features of later
game consoles, like the Atari 2600, is that it was possible to purchase additional cartridges, which
looked like 8-track tapes, allowing users to upload new games.
The game industry expanded rapidly throughout the 1970s and early 1980s, but took an abrupt
downturn in 1983. The industry came roaring back. One reason was the popularity of a game
developed by Shiguero Miyamoto for Nintendo, called Donkey Kong, which featured a cute Italian
“everyman” character, named Mario, who would jump over various obstacles to eventually save
his lady from an evil kidnapping gorilla. Mario went on to become an icon of the computer game
industry, and Donkey Kong generated many spin-offs involving Mario, notably Super Mario Bros.
Eventually Donkey Kong was licensed to the Coleco company for release in their home game
console and Super Mario Bros. was one of the top sellers on the Nintendo Entertainment System
(NES).
Consoles, Handhelds, and MMOs: The 1990s saw the development of many game consoles with
ever increasing processing and graphical capabilities. We mentioned the Nintendo NES above
with Donkey Kong. Also, the early 1990s saw the release of the Sega Genesis and its flagship
The Scope of this Course: At some universities, game development constitutes a series of courses on
various topics. Here, we will be able to focus on only a small part of the spectrum of relevant topics.
While most game designers make use of sophisticate software tools (for graphics, modeling, AI, physics),
it is not within the scope of this class to teach a particular set of tools (even though we will discuss
game engines for the sake of project development). As in most upper-division computer science courses,
our interest is not in how to use these tools, but rather how to build these systems. In particular, we
will discuss the theory, practice, and technology that underlies the implementation of these systems.
This semester, we will touch upon only a subset of these issues. For each, we will discuss how concepts
from computer science (and other areas such as mathematics and physics) can be applied to address
the challenging elements that underlie game implementation.
Course Overview: In this course, we will provide an overview of what might be called the science and
engineering of computer games. In particular, we will see how concepts developed in computer science
can be applied to address the aforementioned elements of computer games. These include the following:
Game Engines: The organization, structure, and overall features of a typical game engine. Intro-
duction to the Unity game engine.
Geometric Programming and Data Structures: Basic aspects of geometry and linear algebra
and their applications to game programming. Bounding volumes and efficient collision detection.
Elements of Computer Graphics: Graphics systems and the graphics pipeline, model-view trans-
formations and camera projection, lighting models, vertex and fragment shaders.
Modelling, and Animation: Shape representations and meshes, level of detail, terrain modeling, ar-
ticulated models and skinning, animation, texture modeling, procedural generation and geometry
synthesis.
AI and Algorithms for Games: Agent-based systems, decision making, finite-state machines, path
planning, multiple-agent motion, flocking and emergent behavior.
Physics and Games: Newtonian dynamics, particle simulation, mass-spring models, collision detec-
tion and response, physics engines.
Networking and Games: TCP/IP, sockets programming, multiplayer gaming, latency hiding, dis-
tributed data consistency.
Security: Common methods of cheating in online games and approaches for detecting and counter-
acting them.
System: This includes low-level software for interacting with the operating system on which the game
engine runs as well as the target system on which the game executes. Target systems can include
general personal computers (running, say, Microsoft Windows, Linux, or Mac OS), game consoles
(e.g., XBox, Playstation, Wii), or mobile devices (e.g., hand-held game consoles, tablets, and
smart phones).
Third-Party SDKs and Middleware: These are libraries and sofware development toolkits (SDKs),
usually provided from a third party. Examples include graphics (e.g., OpenGL and DirectX),
physics (e.g., Havok, PhysX, and Bullet), basic algorithms and data structures (e.g., Java Class
Library, C++ STL, Boost++), character animation (e.g., Granny), networking support (e.g.,
Unix sockets).
Platform Independence Layer: Since most games are developed to run on many different plat-
forms, this layer provides software to translate between game specific operations and their system-
dependent implementations.
Core System: These include basic tools necessary in any software development environment, includ-
ing assertion testing, unit testing, memory allocation/deallocation, mathematics library, debug-
ging aids, parsers and serializers (e.g., for xml-based import and export), file I/O, video playback.
Resource Manager: Large graphics programs involve accessing various resources, such as geometric
models for characters and buildings, texture images for coloring these geometric models, maps
representing the game’s world. The job of the resource manager is to allow the program to
load these resources. Since resources may be compressed to save space, this may also involve
decompression.
Rendering Engine: This is one of the largest and most complex components of any real-time 3-
dimensional game. This involves all aspects of drawing, and may involve close interaction with
the graphics processing unit (GPU) for the sake of enhanced efficiency.
Low-Level Renderer: This comprises the most basic elements of producing images. Your pro-
gram interacts with the GPU by asking it to render objects. Each object may be as simple
Terrain Rendering etc. Player-Follow Debug Fly- Sight Traces & Path Finding
Collision Manifold Movement
Camera Through Camera Perception (A∗ Search)
Spatial Indices Occlusion & PVS Level-of-Detail Animation Game State Audio Playback/
(BSP/Quad-Tree) Culling System Decompression Replication Management
Ragdoll
Skeletal Mesh
Rendering Physics
Low-Level Renderer Profile & Debug Collision and Physics Human Interface
Materials & Static and Dynamic
Cameras Text & Fonts
Recording & Forces & Ray/Shapes Devices (HID)
Shaders Lighting Playback Constraints Casting (Queries)
Primitive Viewports & Texture & Debug Drawing Memory & Rigid Bodies Game-Specific
Phantoms
Submission Virtual Screens Surface Mgmt. (Lines etc.) Perf. Stats Interface
Resource Manager
Core Systems
Module Start-Up Unit Testing Memory Math Library Strings & Debug Printing Localization Movie
Assertions
and Shut-Down Allocation Hash String IDs & Logging Services Player
Parser (CSV, Profiling/Stats Engine Random Number Curves & RTTI/Reflection Object Handles Asynchronous Optimal Media
XML, etc.) Gathering Configuration Generator Surfaces Library & Serialization Unique IDs File I/O I/O
3rd-Party SDKs
DirectX, OpenGL Havok, PhysX Boost++ STL/STLPort Granny, Havok Euphoria etc.
AI middleware
libgcm, Edge, etc ODE, etc. Animation, etc
OS
Drivers
CPU
6.4 GB/sec
Graphics
South Bridge
Memory
Other Peripherals
Fig. 1: Architecture of a simple GPU-based graphics system. (Adapted from NVIDIA GeForce documenta-
tion.)
Traditionally, GPUs are designed to perform a relatively limited fixed set of operations, but with
blazing speed and a high degree of parallelism. Modern GPUs are programmable, in that they provide
the user the ability to program various elements of the graphics process. For example, modern GPUs
support programs called vertex shaders and fragment shaders, which provide the user with the ability
to fine-tune the colors assigned to vertices and fragments.
Recently there has been a trend towards what are called general purpose GPUs (GPGPUs), which
can perform not just graphics rendering, but general scientific calculations on the GPU. Since we are
interested in graphics here, we will focus on the GPUs traditional role in the rendering process.
The Graphics Pipeline: The key concept behind all GPUs is the notion of the graphics pipeline. This
is conceptual tool, where your user program sits at one end sending graphics commands to the GPU,
and the frame buffer sits at the other end. A typical command from your program might be “draw
a triangle in 3-dimensional space at these coordinates.” The job of the graphics system is to convert
this simple request to that of coloring a set of pixels on your display. The process of doing this is
Vertex/Primitive Data
Tesselation: Converts higher-order primitives (such as surfaces), displacement maps, and mesh patches
to 3-dimensional vertex locations and stores those locations in vertex buffers, that is, arrays of
vertex data.
Vertex processing: Vertex data is transformed from the user’s coordinate system into a coordinate
system that is more convenient to the graphics system. For the purposes of this high-level overview,
you might imagine that the transformation projects the vertices of the three-dimensional triangle
onto the 2-dimensional coordinate system of your screen, called screen space.
Geometry processing: This involves a number of tasks:
• Clipping is performed to snip off any parts of your geometry that lie outside the viewing area
of the window on your display.
• Back-face culling removes faces of your mesh that lie on the side of an object that faces away
from the camera.
• Lighting determines the colors and intensities of the vertices of your objects. Lighting is
performed by a program called a vertex shader, which you provide to the GPU.
• Rasterization converts the geometric shapes (e.g., triangles) into a collection of pixels on the
screen, called fragments.
Texture sampling: Texture images are sampled and smoothed and the resulting colors are assigned
to individual fragments.
Fragment Processing: Each fragment is then run through various computations. First, it must be
determined whether this fragment is visible, or whether it is hidden behind some other fragment.
If it is visible, it will then be subjected to coloring. This may involve applying various coloring
textures to the fragment and/or color blending from the vertices, in order to produce the effect
of smooth shading.
Fragment Rendering: Generally, there may be a number of fragments that affect the color of a
given pixel. (This typically results from translucence or other special effects like motion blur.)
The colors of these fragments are then blended together to produce the final pixel color. Fog
effects may also be involved to alter the color of the fragment. The final output of this stage is
the frame-buffer image.
Unity Basic Concepts: The fundamental structures that make up Unity are the same as in most game
engines. As with any system, there are elements and organizational features that are unique to this
particular system.
Project: The project contains all the elements that makes up the game, including models, assets,
scripts, scenes, and so on. Projects are organized hierarchically in the same manner as a file-
system’s folder structure.
Scenes: A scene contains a collection of game objects that constitute the world that the player sees
at any time. A game generally will contain many scenes. For example, different levels of a game
would be stored as different scenes. Also, special screens (e.g., an introductory screen), would be
modeled as scenes that essentially have only a two-dimensional content.
Packages: A package is an aggregation of game objects and their associated meta-data. Think of a
package in the same way as library packages in Java. They are related objects (models, scripts,
materials, etc.). Here are some examples:
• a collection of shaders for rendering water effects
• particle systems for creating explosions
• models of race cars for a racing game
• models of trees and bushes to create a woodland scene
Unity provides a number standard packages for free, and when a new project is created, you can
select the packages that you would like to have imported into your project.
Prefabs: A prefab is a template for grouping various assets under a single header. Prefabs are used
for creating multiple instances of a common object. Prefabs are used in two common ways. First,
in designing a level for your game you may have a large number of copies of a single element (e.g.,
street lights). Once designed, a street light prefab can be instantiated and placed in various parts
of the scene. If you decide to want to change the intensity of light for all the street lights, you
can modify the prefab, and this will cause all the instances to change. A second use is to generate
dynamic game objects. For example, you could model an explosive shell shot from a cannon as
a prefab. Each time the cannon is shot a new prefab shell would be instantiated (through one of
your scripts). In this way each new instance of the shell will inherit all the prefabs properties,
but it will have its own location and state.
Game Objects: The game objects are all the “things” that constitute your scene. Game objects not
only include concrete objects (e.g., a chair in a room), but also other elements that reside in space
such as light sources, audio sources, and cameras. Empty game objects are very useful, since
they can to serve as parent nodes in the hierarchy. Every game object (even empty ones) has a
position and orientation space. This, it can be moved, rotated and scaled. (As mentioned above,
whenever a transformation is applied to a parent object, it is automatically propagated to all of
this object’s descendants descendants.)
Game objects can be used for invisible entities that are used to control a game’s behavior. (For
example, suppose that you want a script to be activated whenever the player enters a room.
You could create an invisible portal object covering the door to the room that triggers an event
Scene View: This window shows all the elements of the current scene. (See description below for
what a scene is.) Most editing of the scene is done through the scene view, because it provides
access to low-level and hidden aspects of the objects. For example, this view will show you where
the camera and light sources are located. In contrast, the Game View, shows the game as it would
appear to the player.
Game View: This window shows the elements of the scene as they would appear to the player.
Inspector: At any time there is an active game object (which the designer selects by clicking on the
object or on its entry in the hierarchy). This window provides all the component information
associated with this object. At a minimum, this includes its position and orientation in space.
However it also has entries for each of the components associated with this object.
Project assets
Hierarchy: This window shows all the game objects that constitute the current scene. (Scenes are
discussed below). As its name suggests, game objects are stored hierarchically in a tree structure.
This makes it possible so that transformations applied to a parent object are then propagated to
all of its descendents. For example, if a building is organized as a collection of rooms (descended
from the building), and each room is organized as a collection of pieces of furniture (descended
from the room), then moving the building will result in all the rooms and pieces of furniture
moving along with it.
Project: The project window contains all of the assets that are available for you to use. Typically,
these are organized into folders, for example, according to the asset type (models, materials, audio,
prefabs, scripts, etc.).
Scripting in Unity: As mentioned above, scripting is used to describe how game objects behave in response
to various events, and therefore it is an essential part of the design of any game. Unity supports
three different scripting languages: C#, UnityScript (a variant of JavaScript), and Boo (a variant of
Python). (I will use C# in my examples. At a high level, C# is quite similar to Java, but there are
minor variations in syntax and semantics.) Recall that a script is an example of a component that is
associated with an game object. In general, a game object may be associated with multiple scripts. (In
Unity, this is done by selecting the game object, adding a component to it, and selecting an existing
predefined script or “New Script” as the component type.)
Geometric Elements: Unity supports a number of objects to assist with geometric processing. We will
discuss these objects in greater detail in a later lecture, but here are a few basic facts.
Vector3: This is standard (x, y, z) vector. As with all C# objects, you need to invoke “new” when
creating a Vector3 object. The following generates a Vector3 variable u with coordinates (1, 2, −3):
Vector3 u = new Vector3 (1 , 2 , -3) ;
The orientation of the axes follows Unity’s (mathematically nonstandard) convention that the
y-axis is directed upwards, the x-axis points to the viewer’s right, and the z-axis points to the
viewer’s forward direction. (Of course, as soon as the camera is repositioned, these orientations
change.)
It is noteworthy that Unity’s axis forms what is called a left-handed coordinate system, which
means that x × y = −z (as opposed to x × y = z, which holds in most mathematics textbooks as
well as other 3D programming systems, such as UE4 and Blender).
We will discuss how to perform ray-casting queries in Unity and how these can be applied in your
programs.
Quaternion: A quaternion is a structure that represents a rotation in 3-dimensional space. There
are many ways to provide Unity with a rotation. The two most common are through the use of
Euler angles, which means specifying three rotation angles, one about the x-axis, one about the
y-axis, and one about the z-axis. The other is by specifying a Vector3 as an axis of rotation and
a rotation angle about this axis. For example, the following both create the same quaternion,
which performs a 30◦ rotation about the vertical (that is, y) axis.
Quaternion q1 = new Quaternion (0 , 30 , 0) ;
Quaternion q2 = Quaternion . AngleAxis (30 , Vector3 . up ) ;
Transform: Every game object in Unity is associated with an object called its transform. This object stores
the position, rotation, and scale of the object. You can use the Transform object to query the object’s
current position (transform.position) and rotation (transform.eulerAngles).
You can also modify the transform to reposition the object in space. These are usually done indirectly
through functions that translate (move) or rotate the object in space. Here are examples:
transform . Translate ( new Vector3 (0 , 1 , 0) ) ; // move up one unit
transform . Rotate (0 , 30 , 0) ; // rotate 30 degrees about y - axis
You might wonder whether these operations are performed relative to the global coordinate system or
the object’s local coordinate system. The answer is that there is an optional parameter (not shown
above) that allows you to select the coordinate system about which the operation is to be interpreted.
Recall that game object’s in Unity reside within a hierarchy, or tree structure. Transformations applied
to an object apply automatically to all the descendants of this object as well. The tree structure is
accessible through the transform. For example, transform.parent returns the transform of the parent
(and transform.parent.gameObject returns the associated Unity game object). You can set a transforms
parent using transform.SetParent(t), where t is the transform of the parent object. It is also possible to
enumerate the children and all the descendants of a transform.
Structure of a Typical Script: A game object may be associated with a number of scripts. Ideally, each
script is responsible for a particular aspect of the game object’s behavior. The basic template for a
Unity script, called MainPlayer, is given in the following code block.
Observe a few things about this code fragment. First, the first using statements provides access to class
objects defined for the Unity engine, and the second provides access to built-in collection data structures
(ArrayList, Stack, Queue, HashTable, etc.) that are part of C#. The main class is MainPlayer, and it
is a subclass of MonoBehaviour. All script objects in Unity are subclasses of MonoBehaviour.
Many script class objects will involve: (1) some sort of initialization and (2) some sort of incremental
updating just prior to each refresh cycle (when the next image is drawn to the display). The template
facilitates this by providing you with two blank functions, Start and Update, for you to fill in. Of course,
there is no requirement to do so. For example, your script may require no explicit initializations (and
thus there is no need for Start), or rather than being updated with each refresh cycle, your script may
be updated in response to specific user inputs or collisions (and so there would be no need for Update).
Awake versus Start: There are two Unity functions for running initializations for your game objects, Start
(as shown above) and Awake. Both functions are called at most once for any game object. Awake will
be called first, and is called as soon as the object has been initialized by Unity. However, the object
might not yet appear in your scene because it has not been enabled. As soon as your object is enabled,
the Start is called.
Let me digress a moment to discuss enabling/disabling objects. In Unity can “turned-on” or “turned-
off” in two different ways (without actually deleting them). In particular, objects can be enabled or
disabled, and objects can be active or inactive. (Each game object in Unity has two fields, enabled and
active, which can be set to true or false.) The difference is that disabling an object stops it from being
rendered or updated, but it does not disable other components, such as colliders. In contrast, making
an object inactive stops all its components.
For example, suppose that some character in your game is spawned only after a particular event takes
place (you cast a magic spell). The object can initially be declared to be disabled, and later when the
spawn event occurs, you ask Unity to enable it. Awake will be called on this object as soon as the
game starts. Start will be called as soon as the object is enabled. If you later disable and object and
re-enable it, Start will not be called again. (Both functions are called at most once.)
To make your life simple, it is almost always adequate to use just the Start function for one-time
initializations. If there are any initializations that must be performed just as the game is starting, then
Awake is one to use.
Controlling Animation Times: As mentioned above, the Update function is called with each update-
cycle of your game. This typically means that every time your scene is redrawn, this function is called.
Redraw functions usually happen very quickly (e.g., 30–100 times per second), but they can happen
more slowly if the scene is very complicated. The problem that this causes is that it affects the speed
that objects appear to move. For example, suppose that you have a collection of objects that spin
around with constant speed. With each call to Update, you rotate them by 3◦ . Now, if Update is called
twice as frequently on one platform than another, these objects will appear to spin twice as fast. This
is not good!
The fix is to determine how much time as elapsed since the last call to Update, and then scale the
rotation amount by the elapsed time. Unity has a predefined variable, Time.deltaTime, that stores the
amount of elapsed time (in seconds) since the last call to Update. Suppose that we wanted to rotate an
object at a rate of 45◦ per second about the vertical axis. The Unity function transform.Rotate will do
this for us. We provide it with a vector about which to rotate, which will usually be (0, 1, 0) for the
up-direction, and we multiply this vector times the number of degrees we wish to have in the rotation.
In order to achieve a rotation of 45◦ per second, we would take the vector (0, 45, 0) and scale it by
Time.deltaTime in our Update function. For example:
By the way, it is a bit of a strain on the reader to remember which axis points in which direction. The
Vector3 class defines functions for accessing important vectors. The call Vector3.up returns the vector
(0, 1, 0). So, the above call would be equivalent to transform.Rotate (Vector3.up * 45 * Time.deltaTime).
Update versus FixedUpdate: While we are discussing timing, there is another issue to consider. Some-
times you want the timing between update calls to be predictable. This is true, for example, when
updating the physics in your game. If acceleration is changing due to gravity, you would like the effect
to be applied at regular intervals. The Update function does not guarantee this. It is called at the
refresh rate for your graphics system, which could be very high on a high-end graphics platform and
much lower for a mobile device.
Unity provides a function that is called in a predictable manner, called FixedUpdate. When dealing
with the physics system (e.g., applying forces to objects) it is better to use FixedUpdate than Update.
When using FixedUpdate, the corresponding elapsed-time variable is Time.fixedDeltaTime. (I’ve read
that Time.fixedDeltaTime is 0.02 seconds, but I wouldn’t bank on that.)
While I am discussing update functions, let me mention one more. LateUpdate() is called after all Update
functions have been called but before redrawing the scene. This is useful to order script execution. For
example a follow-camera should always be updated in LateUpdate because it tracks objects that might
have moved due to other Update function calls.
Accessing Components: As mentioned earlier, each game object is associated with a number of defining
entities called its components. The most basic is the transform component, which describes where
the object is located. Most components have constant values, and can be set in the Unity editor (for
example, by using the AddComponent command. However, it is often desirable to modify the values of
components at run time. For example, you can alter the buoyancy of a balloon as it loses air or change
the color of a object to indicate the presence of damage.
Unity defines class types for each of the possible components, and you can access and modify this
information from within a script. First, in order to obtain a reference to a component, you use
the command GetComponent. For example, to access the rigid-body component of the game object
associated with this script, you could invoke. Recall that this component controls the physics properties
of the object.
Rigidbody rb = GetComponent < Rigidbody >() ; // get rigidbody component
This returns a reference rb to this object’s rigid body component, and similar calls can be made to
access any of the other components associated with a game object. (By the way, this call was not really
needed. Because the rigid body is such a common component to access, every MonoBehaviour object
has a member called rigidbody, which contains a reference to the object’s rigid body component, or null
if there is none.)
Public Variables and the Unity Editor: One of the nice things that the Unity IDE provides you with
is the ability to modify the member variables of your game objects directly within the editor. For
example, suppose that you have a moving object that has an adjustable parameter. Consider the
following code fragment that is associated with a floating ball game object. The script defines a public
member variable floatSpeed to control the speed at which a ball floats upwards.
public class BallBehavior : MonoBehaviour {
Script Name
Variable name
Value
When you are running the program in the Unity editor, you can adjust the values of floatSpeed and
jumpForce (while the program is running) until you achieve the desired results. If you like, you can then
fix their values in the script and make them private.
Note that there are three different ways that the member variable floatSpeed could be set:
Note that (3) takes precedence over (2), which takes precedence over (1).
By the way, you can do this not only for simple variables as above, but you can also use this mechanism
for passing game objects through the public variables of a script. Just make the game object variable
public, then drag the game object from the hierarchy over the variable value in the editor.
Object References by Name or Tag: Since we are on the subject of how to obtain a reference from
one game object to another, let us describe another mechanism for doing this. Each game object is
associated with a name, and its can be also be associated with one more tags. Think of a name as a
unique identifier for the game object (although, I don’t think there is any requirement that this be so),
whereas a tag describes a type, which might be shared by many objects.
Both names and tags are just a string that can be associated with any object. Unity defines a number
of standard tags, and you can create new tags of your own. When you create a new game object you
can specify its name. In each object’s inspector window, there is a pull-down menu that allows you
associate any number of tags with this object.
Here is an example of how to obtain a the main camera reference by its name:
GameObject camera = GameObject . Find ( " Main Camera " ) ;
Suppose that we assign the tag “Player” with the player object and “Enemy” with the various enemy
objects. We could access the object(s) through the tag using the following commands:
GameObject player = GameObject . FindWithTag ( " Player " ) ;
GameObject [] enemies = GameObject . Fi nd Gam eO bj ec tsW it hT ag ( " Enemy " ) ;
In the former case, we assume that there is just one object with the given tag. (If there is none, then
null is returned. If there is more than one, it returns one one of them.) In the latter case, all objects
having the given tag are returned in the form of an array.
Note that there are many other commands available for accessing objects and their components, by
name, by tag, or by relationship within the scene graph (parent or child). See the Unity documentation
Note that we placed the call to GameObject.Find in the Start function. This is because this operation is
fairly slow, and ideally should be done sparingly.
Colliders and Triggers: Games are naturally even driven. Some events are generated by the user (e.g.,
input), some occur at regular time intervals (e.g., Update()), and finally others are generated within
the game itself. An important class of the latter variety are collision events. Collsions are detected by
a component called a collider. Recall that this is a shape that (approximately) encloses a given game
object.
Colliders come in two different types, colliders and triggers. Think of colliders as solid physical objects
that should not overlap, whereas a trigger is an invisible barrier that sends a signal when crossed.
For example, when a rolling ball hits a wall, this is a collider type event, since the ball should not be
allowed to pass through the wall. On the other hand, if we want to detect when a player enters a room,
we can place an (invisible) trigger over the door. When the player passes through the door, the trigger
event will be generated.
There are various event functions for detecting when an object enters, stays within, or exits, collid-
er/trigger region. These include, for example:
• For colliders: void OnCollisionEnter(), void OnCollisionStay(), void OnCollisionExit()
• For triggers: void OnTriggerEnter(), void OnTriggerStay(), void OnTriggerExit()
More about RigidBody: Earlier we introduced the rigid-body component. What can we do with this
component? Let’s take a short digression to discuss some aspects of rigid bodies in Unity. We can use
this reference to alter the data members of the component, for example, the body’s mass:
rb . mass = 10 f ; // change this body ’ s mass
Unity objects can be controlled by physics forces (which causes them to move) or are controlled directly
by the user. One advantage of using physics to control an object is that it will automatically avoid
collisions with other objects. In order to move a body that is controlled by physics, you do not set its
velocity. Rather, you apply forces to it, and these forces affect it velocity. Recall that from physics, a
Sometimes it is desirable to take over control of the body’s movement yourself. To turn off the effect
of physics, set the rigid body’s type to kinematic.
rb . isKinematic = true ; // direct motion control - - - bypass physics
Once a body is kinematic, you can directly set the body’s velocity directly, for example.
rb . velocity = Vector3 (0 f , 10 f , 0 f ) ; // move up 10 units / second
(If the body had not been kinematic, Unity would issue an error message if you attempted to set its
velocity in this way.) By the way, since the y-axis points up, the above statement is equivalent to
setting the velocity to Vector3.up * 10f . This latter form is, I think, more intuitive, but I just wanted
to show that these things can be done in various ways.
Kinematic and Static: In general, Physics computations can be expensive, and Unity has a number of
tricks for optimizing the process. As mentioned earlier, an object can be set to kinematic, which means
that your scripts control the motion directly, rather than physics forces. Note that this only affects
motion. Kinematic objects still generate events in the event of collisions and triggers.
Another optimization involves static objects. Because many objects in a scene (such as buildings)
never move, you can declare such objects to be static. (This is indicated by a check box in the upper
right corner of the object’s Inspector window.) Unity does not perform collision detection between two
static objects. (It checks for collisions/triggers between static-to-dynamic and dynamic-to-dynamic,
but not static-to-static.) This can save considerable computation time since many scenes involve
relatively few moving objects. Note that you can alter the static-dynamic property of an object,
but the documentation warns that this is somewhat risky, since the physics engine precomputes and
caches information for static objects, and this information might be erroneous if an object changes this
property.
Event Functions: Because script programming is event-driven, most of the methods that make up MonoBe-
haviour scripts are event callbacks, that is, functions that are invoked when a particular event occurs.
Examples of events include (1) initialization, (2) physics events, such as collisions, and (3) user-input
events, such as mouse or keyboard inputs.
Unity passes the control to each script intermittently by calling a determined set of functions, called
Event Functions. The list of available functions is very large, here are the most commonly used ones:
Initialization: Awake and Start as mentioned above.
Regular Update Events: Update functions are called regularly throughout the course of the game.
These include redraw events (Update and LateUpdate) and physics (or any other regular time events)
(FixedUpdate).
GUI Events: These events are generated in response to user-inputs regarding the GUI elements of
your game. For example, if you have a GUI element like a push-down button, you can be informed
when a mouse button has been pressed down, up, or is hovering over this element with callbacks
such as void OnMouseDown(), void OnMouseUp(), void OnMouseOver().They are usually processed in
Update or FixedUpdate.
Physics Events: These include the collider and trigger functions mentioned earlier (such as OnColli-
sionEnter, OnTriggerEnter).
There are many things that we have not listed. For further information, see the Unity user manual.
Unfortunately, this code does not work, because we cannot just interrupt the loop in the middle to
redraw the new scene. A coroutine is like a function that has the ability to pause execution and return
(to Unity) but then to continue where it left off on the following frame. To do this, we use the yield
return construct from C#, which allows us to call a function multiple times, but each time it is called,
it starts not from the beginning, but from where it left off. Such a function has a return type of an
iterator, but we will not worry about this for this example.
IEnumerator Fade () { // gradually fade from opaque to transparent
for ( float f = 1 f ; f >= 0; f -= 0.1 f ) {
Color c = renderer . material . color ;
c.a = f;
renderer . material . color = c ;
yield return null ; // return to Unity to redraw the scene
}
}
The next time this function is called, it resumes just after the return statement in order to start the
next iteration of the loop. (Pretty cool!)
If you want to control the timing, so that this happens, say, once every tenth of a second, you can add
a delay into the return statement, “yield return new WaitForSeconds(0.1f)”.
Transformations: You are asked to render a twirling boomerang flying through the air. How would
you represent the boomerang’s rotation and translation over time in 3-dimensional space? How
would you compute its exact position at a particular time?
Geometric Intersections: Given the same boomerang, how would you determine whether it has hit
another object?
Orientation: You have been asked to design the AI for a non-player agent in a flight combat simulator.
You detect the presence of a enemy aircraft in a certain direction. How should you rotate your
aircraft to either attack (or escape from) this threat?
Change of coordinates: We know the position of an object on a table with respect to a coordinate
system associated with the table. We know the position of the table with respect to a coordinate
system associated with the room. What is the position of the object with respect to the coordinate
system associated with the room?
Reflection and refraction: We would like to simulate the way that light reflects off of shiny objects
and refracts through transparent objects.
Such basic geometric problems are fundamental to computer graphics, and over the next few lectures,
our goal will be to present the tools needed to answer these sorts of questions. There are various formal
geometric systems that arise naturally in game programming and computer graphics. The principal
ones are:
Affine Geometry: The geometry of simple “flat things”: points, lines, planes, line segments, trian-
gles, etc. There is no defined notion of distance, angles, or orientations, however.
Euclidean Geometry: The geometric system that is most familiar to us. It enhances affine geometry
by adding notions such as distances, angles, and orientations (such as clockwise and counterclock-
wise).
Projective Geometry: In Euclidean geometry, there is no notion of infinity (in the same way that
in standard arithmetic, you cannot divide by zero). But in graphics, we often need to deal with
infinity. (For example, two parallel lines in 3-dimensional space can meet at a common vanishing
point in a perspective rendering. Think of the point in the distance where two perfectly straight
train tracks appear to meet. Computing this vanishing point involves points at infinity.) Projective
geometry permits this.
Affine Geometry: Affine geometry is basic to all geometric processing. It’s basic elements are:
2 Unity does not distinguish between them. The data type Vector3 is used to represent both points and vectors.
Affine Combinations: Although the algebra of affine geometry has been careful to disallow point addition
and scalar multiplication of points, there is a particular combination of two points that we will consider
legal. The operation is called an affine combination.
Let’s say that we have two points p and q and want to compute their midpoint r, or more generally a
point r that subdivides the line segment pq into the proportions α and 1 − α, for some α ∈ [0, 1]. (The
case α = 1/2 is the case of the midpoint). This could be done by taking the vector q − p, scaling it by
α, and then adding the result to p. That is,
r = p + α(q − p),
(see Fig. 6(a)). Another way to think of this point r is as a weighted average of the endpoints p and
q. Thinking of r in these terms, we might be tempted to rewrite the above formula in the following
(technically illegal) manner:
r = (1 − α)p + αq,
(see Fig. 6(b)). Observe that as α ranges from 0 to 1, the point r ranges along the line segment from
p to q. In fact, we may allow to become negative in which case r lies to the left of p, and if α > 1,
then r lies to the right of q (see Fig. 6(c)). The special case when 0 ≤ α ≤ 1, this is called a convex
combination.
r = p + 23 (q − p) α<0 1p + 1r + 1q r
2 4 4
p p p
1p + 2q 0<α<1 0p + 12 r + 12 q
3 3
q p
q q (1 − α)p + αq α>1 q
(a) (b) (c) (d)
In general, we define the following two operations for points in affine space.
Affine combination: Given a sequence of points p1 , p2 , . . . , pn , an affine combination is any sum of
the form
α1 p1 + α2 p2 + . . . + αn pn ,
P
where α1 , α2 , . . . , αn are scalars satisfying i αi = 1.
Convex combination: Is an affine combination, where in addition we have αi ≥ 0 for 1 ≤ i ≤ n.
Affine and convex combinations have a number of nice uses in graphics. For example, any three
noncollinear points determine a plane. There is a 1–1 correspondence between the points on this plane
and the affine combinations of these three points. Similarly, there is a 1–1 correspondence between the
Note that inner (and hence dot) product is defined only for vectors, not for points.
Using the dot product we may define a number of concepts, which are not defined in regular affine
geometry (see Fig. 7). Note that these concepts generalize to all dimensions.
√
Length: of a vector ~v is defined to be ~v · ~v , and is denoted by k~v k (or as |~v |).
Normalization: Given any nonzero vector ~v , define the normalization to be a vector of unit length
that points in the same direction as ~v , that is, ~v /k~v k. We will denote this by vb.
Distance between points: dist(p, q) = kp − qk.
Angle: between two nonzero vectors ~u and ~v (ranging from 0 to π) is
−1 ~u · ~v
ang(~u, ~v ) = cos = cos−1 (b
u · vb).
k~ukk~v k
This is easy to derive from the law of cosines. Note that this does not provide us with a signed
angle. We cannot tell whether ~u is clockwise our counterclockwise relative to ~v . We will discuss
signed angles when we consider the cross-product.
Orthogonality: ~u and ~v are orthogonal (or perpendicular) if ~u · ~v = 0.
Orthogonal projection: Given a vector ~u and a nonzero vector ~v , it is often convenient to decompose
~u into the sum of two vectors ~u = ~u1 + ~u2 , such that ~u1 is parallel to ~v and ~u2 is orthogonal to ~v .
(~u · ~v )
~u1 ← ~v , ~u2 ← ~u − ~u1 .
(~v · ~v )
u
u u2
v
θ v
u1
Angle between vectors Orthogonal projection and its complement
Doing it with Unity: Unity does not distinguish between points and vectors. Both are represented using
Vector3. Unity supports many of the vector operations by overloading operators. Given vectors u, v,
and w, all of type Vector3, the following operators are supported:
u = v + w ; // vector addition
u = v - w ; // vector subtraction
if ( u == v || u != w ) { ... } // vector comparison
u = v * 2.0 f ; // scalar multiplication
v = w / 2.0 f ; // scalar division
You can access the components of a Vector3 using as either using axis names, such as, u.x, u.y, and u.z,
or through indexing, such as u[0], u[1], and u[2].
The Vector3 class also has the following members and static functions.
float x = v . magnitude ; // length of v
Vector3 u = v . normalize ; // unit vector in v ’ s direction
float a = Vector3 . Angle (u , v ) ; // angle ( degrees ) between u and v
float b = Vector3 . Dot (u , v ) ; // dot product between u and v
Vector3 u1 = Vector3 . Project (u , v ) ; // orthog proj of u onto v
Vector3 u2 = Vector3 . ProjectOnPlane (u , v ) ; // orthogonal complement
Some of the Vector3 functions apply when the objects are interpreted as points. Let p and q be
points declared to be of type Vector3. The function Vector3.Lerp is short for linear interpolation. It is
essentially a two-point special case of a convex combination. (The combination parameter is assumed
to lie between 0 and 1.)
float b = Vector3 . Distance (p , q ) ; // distance between p and q
Vector3 midpoint = Vector3 . Lerp (p , q , 0.5 f ) ; // convex combination
Local and Global Frames of Reference: Last time we introduced the basic elements of affine and Eu-
clidean geometry: points and (free) vectors. However, as of yet we have no mechanism for representing
Bases, Vectors, and Coordinates: The first question is how to represent points and vectors in affine
space. We will begin by recalling how to do this in linear algebra, and generalize from there. We
know from linear algebra that if we have 2-linearly independent vectors, ~u0 and ~u1 in 2-space, then we
can represent any other vector in 2-space uniquely as a linear combination of these two vectors (see
Fig. 8(a)):
~v = α0 ~u0 + α1 ~u1 ,
for some choice of scalars α0 , α1 .
v v = 2u0 + 3u1
w = 3e0 + 2e1
v[F ] = (2, 3)
w[F ] = (3, 2)
y
u1 u1 e1
x
u0 u0 e0
(a) (b)
Fig. 8: Bases and linear combinations in linear algebra (a) and the standard basis (b).
Thus, given any such vectors, we can use them to represent any vector in terms of a pair of scalars
(α0 , α1 ). In general d linearly independent vectors in dimension d is called a basis. The most convenient
basis to work with consists of two vectors, each of unit length, that are orthogonal to each other. Such
a collection of vectors is said to be orthonormal. The standard basis consisting of the x- and y-unit
vectors is an example of such a basis (see Fig. 8(b)).
Note that we are using the term “vector” in two different senses here, one as a geometric entity and the
other as a sequence of numbers, given in the form of a row or column. The first is the object of interest
(i.e., the abstract data type, in computer science terminology), and the latter is a representation. As
is common in object oriented programming, we should “think” in terms of the abstract object, even
though in our programming we will have to get dirty and work with the representation itself.
Coordinate Frames and Coordinates: Now let us turn from linear algebra to affine geometry. Again,
let us consider just 2-dimensional space. To define a coordinate frame for an affine space we would
like to find some way to represent any object (point or vector) as a sequence of scalars. Thus, it seems
natural to generalize the notion of a basis in linear algebra to define a basis in affine space. Note that
free vectors alone are not enough to define a point (since we cannot define a point by any combination
of vector operations). To specify position, we will designate an arbitrary point, denoted O, to serve
as the origin of our coordinate frame. Let ~u0 and ~u1 be a pair of linearly independent vectors. We
already know that we can represent any vector uniquely as a linear combination of these two basis
p = α0 ~u0 + α1 ~u1 + O,
for some pair of scalars α0 and α1 . This suggests the following definition.
Definition: A coordinate frame for a d-dimensional affine space consists of a point (which we will
denote O), called the origin of the frame, and a set of d linearly independent basis vectors.
Given the above definition, we now have a convenient way to express both points and vectors. As with
linear algebra, the most natural type of basis is orthonormal. Given an orthonormal basis consisting
of origin O and unit vectors ~e0 and ~e1 , we can express any point p and any vector ~v as:
p = 3 · ~e0 + 2 · ~e1 + 1 · O
v ⇒ p[F ] = (3, 2, 1)
p
e1
e0 v = 2 · ~e0 + 1 · ~e1 + 0 · O
O ⇒ v[F ] = (2, 1, 0)
This suggests a nice method for expressing both points and vectors using a common notation. For the
given coordinate frame F = (~e0 , ~e1 , O) we can express the point p and the vector ~v as
Properties of homogeneous coordinates: The choice of appending a 1 for points and a 0 for vectors
may seem to be a rather arbitrary choice. Why not just reverse them or use some other scalar values?
The reason is that this particular choice has a number of nice properties with respect to geometric
operations.
~v = p − q.
If we apply the difference rule that we defined last time for points, and then convert this vector into
it coordinates relative to frame F , we find that ~v[F ] = (−3, 1, 0). Thus, to compute the coordinates
of p − q we simply take the component-wise difference of the coordinate vectors for p and q. The
1-components nicely cancel out, to give a vector result.
p p[F ] = (1, 4, 1)
~v
q
q[F ] = (4, 3, 1)
e1
e0 ~v[F ] = (1 − 4, 4 − 3, 1 − 1)
O = (−3, 1, 0)
In general, a nice feature of this representation is the last coordinate behaves exactly as it should. Let
u and v be either points or vectors. After a number of operations of the forms u + v or u − v or αu
(when applied to the coordinates) we have:
Cross Product: The cross product is an important vector operation in 3-space. You are given two vectors
and you want to find a third vector that is orthogonal to these two. This is handy in constructing
coordinate frames with orthogonal bases. There is a nice operator in 3-space, which does this for us,
called the cross product.
The cross product is usually defined in standard linear 3-space (since it applies to vectors, not points).
So we will ignore the homogeneous coordinate here. Given two vectors in 3-space, ~u and ~v , their cross
product is defined as follows (see Fig. 11(a)):
u y v z − u z vy
~u × ~v = uz vx − ux vz .
ux vy − uy vx
A nice mnemonic device for remembering this formula, is to express it in terms of the following symbolic
determinant:
~ex ~ey ~ez
~u × ~v = ux uy uz .
vx vy vz
Here ~ex , ~ey , and ~ez are the three coordinate unit vectors for the standard basis. Note that the cross
product is only defined for a pair of free vectors and only in 3-space. Furthermore, we ignore the
homogeneous coordinate here. The cross product has the following important properties:
u u
v × u = −(u × v)
(a) (b)
Skew symmetric: ~u × ~v = −(~v × ~u) (see Fig. 12(b)). It follows immediately that ~u × ~u = 0 (since it
is equal to its own negation).
Nonassociative: Unlike most other products that arise in algebra, the cross product is not associative.
That is
(~u × ~v ) × w
~ 6= ~u × (~v × w).
~
Bilinear: The cross product is linear in both arguments. For example:
~u × (α~v ) = α(~u × ~v ),
~u × (~v + w)
~ = (~u × ~v ) + (~u × w).
~
Perpendicular: If ~u and ~v are not linearly dependent, then ~u × ~v is perpendicular to ~u and ~v , and is
directed according the right-hand rule.
Angle and Area: The length of the cross product vector is related to the lengths of and angle between
the vectors. In particular:
|~u × ~v | = |u||v| sin θ,
where θ is the angle between ~u and ~v . The cross product is usually not used for computing angles
because the dot product can be used to compute the cosine of the angle (in any dimension) and
it can be computed more efficiently. This length is also equal to the area of the parallelogram
whose sides are given by ~u and ~v . This is often useful.
The cross product is commonly used in computer graphics for generating coordinate frames. Given
two basis vectors for a frame, it is useful to generate a third vector that is orthogonal to the first two.
The cross product does exactly this. It is also useful for generating surface normals. Given two tangent
vectors for a surface, the cross product generate a vector that is normal to the surface.
Orientation: Given two real numbers p and q, there are three possible ways they may be ordered: p < q,
p = q, or p > q. We may define an orientation function, which takes on the values +1, 0, or −1 in
each of these cases. That is, Or1 (p, q) = sign(q − p), where sign(x) is either −1, 0, or +1 depending on
whether x is negative, zero, or positive, respectively. An interesting question is whether it is possible
to extend the notion of order to higher dimensions.
The answer is yes, but rather than comparing two points, in general we can define the orientation of
d + 1 points in d-space. We define the orientation to be the sign of the determinant consisting of their
homogeneous coordinates (with the homogenizing coordinate given first). For example, in the plane
and 3-space the orientation of three points p, q, r is defined to be
1 1 1 1
1 1 1 px qx rx sx
Or2 (p, q, r) = sign det px qx rx , Or3 (p, q, r, s) = sign det
py qy ry sy .
py qy ry
pz qz rz sz
You might ask why put the homogeneous coordinate first? The answer a mathematician would give
you is that is really where it should be in the first place. If you put it last, then positive oriented
things are “right-handed” in even dimensions and “left-handed” in odd dimensions. By putting it first,
positively oriented things are always right-handed in orientation, which is more elegant. Putting the
homogeneous coordinate last seems to be a convention that arose in engineering, and was adopted later
by graphics people.
The value of the determinant itself is the area of the parallelogram defined by the vectors q − p and
r − p, and thus this determinant is also handy for computing areas and volumes. Later we will discuss
other methods.
Orientation testing is a very useful tool, but it is (surprisingly) not very widely known in the areas
of computer game programming and computer graphics. For example, suppose that we have a bullet
path, represented by a line segment pq. We want to know whether the linear extension of this segment
intersects a target triangle, 4abc. We can determine this using three orientation tests. To see the
→
− →−
connection, consider the three directed edges of the triangle ab bc and → −
ca. Suppose that we place an
observer along each of these edges, facing the direction of the edge. If the line passes through the
triangle, then all three observers will see the directed line →
−
pq passing in the same direction relative to
their edge (see Fig. 13). (This might take a bit of time to convince yourself of this. To make it easier,
imagine that the triangle is on the floor with a, b, and c given in counterclockwise order, and the line is
vertical with p below the floor and q above. The line hits the triangle if and only if all three observers,
when facing the direction of their respective edges, see the line on their left. If we reverse the roles of
p and q, they will all see the line as being on their right. In any case, they all agree.)
q b
c p
(By the way, this tests only whether the infinite line intersects the triangle. To determine whether the
segment intersects the triangle, we should also check that p and q lie on opposite sides of the triangle.
Can you see how to do this with two additional orientation tests?)
These transformations all have a number of things in common. For example, they all map lines to
lines. Note that some (translation, rotation, reflection) preserve the lengths of line segments and the
angles between segments. These are called rigid transformations. Others (like uniform scaling) preserve
angles but not lengths. Still others (like nonuniform scaling and shearing) do not preserve angles or
lengths.
Formal Definition: Formally, an affine transformation is a mapping from one affine space to another
(which may be, and in fact usually is, the same space) that preserves affine combinations. For example,
this implies that given any affine transformation T and two points p and q, and any scalar α,
For example, if r is the midpoint of segment pq, then T (r) is the midpoint of the transformed line
segment T (p)T (q).
Matrix Representations of Affine Transformations: The above definition is rather abstract. It is pos-
sible to present any affine transformation T in d-dimensional space as a (d + 1) × (d + 1) matrix. For
Here are a number of concrete examples of how this applies to various transformations. Rather than
considering this in the context of 2-dimensional transformations, let’s consider it in the more general
setting of 3-dimensional transformations. The two dimensional cases can be extracted by just ignoring
the rows and columns for the z-coordinates.
Translation: Translation by a fixed vector ~v maps any point p to p + ~v . Note that, since free vectors
have no position in space, they are not altered by translation (see Fig. 16(a)).
Suppose that relative to the standard frame, v[F ] = (αx , αy , αz , 0) are the homogeneous coor-
dinates of ~v . The three unit vectors are unaffected by translation, and the origin is mapped to
O + ~v , whose homogeneous coordinates are (αx , αy , αz , 1). Thus, by the rule given earlier, the
homogeneous matrix representation for this translation transformation is
1 0 0 αx
0 1 0 αy
T (~v ) =
0 0 1 αz .
0 0 0 1
Scaling: Uniform scaling is a transformation which is performed relative to some central fixed point.
We will assume that this point is the origin of the standard coordinate frame. (We will leave
the general case of scaling about an arbitrary point in space as an exercise.) Given a scalar
β, this transformation maps the object (point or vector) with coordinates (αx , αy , αz , αw ) to
(βαx , βαy , βαz , αw ) (see Fig. 16(b)).
In general, it is possible to specify separate scaling factors for each of the axes. This is called
nonuniform scaling. The unit vectors are each stretched by the corresponding scaling factor, and
o+v e0
o e0 o e0 o 2e0 o e0 o e0
(a) (b) (c)
the origin is unmoved. Thus, the transformation matrix has the following form:
βx 0 0 0
0 βy 0 0
S(βx , βy , βz ) =
0
.
0 βz 0
0 0 0 1
0 0 0 1
The cases for the other two coordinate frames are similar. Reflection about an arbitrary line (in
2-space) or a plane (in 3-space) is left as an exercise.
Rotation: In its most general form, rotation is defined to take place about some fixed point, and
around some fixed vector in space. We will consider the simplest case where the fixed point is the
origin of the coordinate frame, and the vector is one of the coordinate axes. There are three basic
rotations: about the x, y and z-axes. In each case the rotation is counterclockwise through an
angle θ (given in radians). The rotation is assumed to be in accordance with a right-hand rule: if
your right thumb is aligned with the axes of rotation, then positive rotation is indicated by the
direction in which the fingers of this hand are pointing. To produce a clockwise rotation, simply
negate the angle involved.
Consider a rotation about the z-axis. The z-unit vector and origin are unchanged. The x-unit
vector is mapped to (cos θ, sin θ, 0, 0), and the y-unit vector is mapped to (− sin θ, cos θ, 0, 0) (see
Fig. 17(a)). Thus the rotation matrix is:
cos θ − sin θ 0 0
sin θ cos θ 0 0
Rz (θ) = 0
.
0 1 0
0 0 0 1
If (as with Unity) the coordinate frame is left-handed, then the directions of all the rotations are
reversed as well (clockwise, rather than counter-clockwise). Rotations about the coordinate axes
are often called Euler angles. Rotations can generally be performed around any vector, called the
axis of rotation, but the resulting transformation matrix is significantly more complex than the
above examples.
Shearing: (Optional) A shearing transformation is perhaps the hardest of the group to visualize.
Think of a shear as a transformation that maps a square into a parallelogram by sliding one side
parallel to itself while keeping the opposite side fixed. In 3-dimensional space, it maps a cube into
a parallelepiped by sliding one face parallel while keeping the opposite face fixed (see Fig. 17(b)).
We will consider the simplest form, in which we start with a unit cube whose lower left corner
coincides with the origin. Consider one of the axes, say the z-axis. The face of the cube that lies
on the xy-coordinate plane does not move. The face that lies on the plane z = 1, is translated by
a vector (hx , hy ). In general, a point p = (px , py , pz , 1) is translated by the vector pz (hx , hy , 0, 0).
This vector is orthogonal to the z-axis, and its length is proportional to the z-coordinate of p.
This is called an xy-shear. (The yz- and xz-shears are defined analogously.)
Under the xy-shear, the origin and x- and y-unit vectors are unchanged. The z-unit vector is
mapped to (hx , hy , 1, 0). Thus the matrix for this transformation is:
1 0 hx 0
0 1 hy 0
Hxy (hx , hy ) =
0 0 1 0 .
0 0 0 1
Transformations in Unity: Recall that all game objects in Unity (in particular, Monobehaviour objects)
are associated with a member called transform, which is of type Transform. This object controls the
Rotation and Orientation in 3-Space: One of the trickier problems 3-d geometry is that of parameter-
izing rotations and the orientation of frames. We have introduced the notion of orientation before (e.g.,
clockwise or counterclockwise). Here we mean the term in a somewhat different sense, as a directional
position in space. Describing and managing rotations in 3-space is a somewhat more difficult task (at
least conceptually), compared with the relative simplicity of rotations in the plane. We will explore
two methods for dealing with rotation, Euler angles and quaternions.
Euler Angles: Leonard Euler was a famous mathematician who lived in the 18th century. He proved many
important theorems in geometry, algebra, and number theory, and he is credited as the inventor of graph
theory. Among his many theorems is one that states that the composition any number of rotations
in three-space can be expressed as a single rotation in 3-space about an appropriately chosen vector.
Euler also showed that any rotation in 3-space could be broken down into exactly three rotations, one
about each of the coordinate axes.
Suppose that you are a pilot, such that the x-axis points to your left, the y-axis points ahead of you,
and the z-axis points up (see Fig. 18). (This is the coordinate frame that I prefer, which is also used
by the Unreal engine. Note that Unity swaps the z and y axes.) Then a rotation about the x-axis,
denoted by φ, is called the pitch. A rotation about the y-axis, denoted by θ, is called roll. A rotation
about the z-axis, denoted by ψ, is called yaw. Euler’s theorem states that any position in space can
be expressed by composing three such rotations, for an appropriate choice of (φ, θ, ψ).
The order in which the rotations are performed is significant. In Unity (using the command trans-
form.Rotate(x, y, z)), the order is the z-axis first, x-axis second, and y-axis third. Recalling that Unity
switches the rolls of the z and y axes relative to the above figure, this means that it performs the
operations in the order roll, then pitch, then yaw.
x φ y x y x y
θ
Pitch Roll Yaw
Fig. 18: Euler angles: pitch, roll, and yaw.
Shortcomings of Euler angles: There are some problems with Euler angles. One issue is the fact that
this representation depends on the choice of coordinate system. In the plane, a 30-degree rotation
is the same, no matter what direction the axes are pointing (as long as they are orthonormal and
right-handed). However, the result of an Euler-angle rotation depends very much on the choice of the
coordinate frame and on the order in which the axes are named. (Later, we will see that quaternions
do provide such an intrinsic system.)
Another problem with Euler angles is called gimbal lock. Whenever we rotate about one axis, it is
possible that we could bring the other two axes into alignment with each other. (This happens, for
example if we rotate x by 90◦ .) This causes problems because the other two axes no longer rotate
independently of each other, and we effectively lose one degree of freedom. Gimbal lock as induced by
one ordering of the axes can be avoided by changing the order in which the rotations are performed.
But, this is rather messy, and it would be nice to have a system that is free of this problem.
Quaternions: We will now delve into a subject, which at first may seem quite unrelated. But keep the above
expression in mind, since it will reappear in most surprising way. This story begins in the early 19th
century, when the great mathematician William Rowan Hamilton was searching for a generalization of
the complex number system.
Imaginary numbers can be thought of as linear combinations of two basis elements, 1 and i, which √
satisfy the multiplication rules 12 = 1, i2 = −1 and 1 · i = i · 1 = i. (The interpretation of i = −1
arises from the second rule.) A complex number a + bi can be thought of as a vector in 2-dimensional
space
√ (a, b). Two important concepts with complex numbers are the modulus, which is defined to
be a2 + b2 , and the conjugate, which is defined to be (a, −b). In vector terms, the modulus is just
the length of the vector and the conjugate is just a vertical reflection about the x-axis. If a complex
number is of modulus 1, then it can be expressed as (cos θ, sin θ). Thus, there is a connection between
complex numbers and 2-dimensional rotations. Also, observe that, given such a unit modulus complex
number, its conjugate is (cos θ, − sin θ) = (cos(−θ), sin(−θ)). Thus, taking the conjugate is something
like negating the associated angle.
Hamilton was wondering whether this idea could be extended to three dimensional space. You might
reason that, to go from 2D to 3D, you need to replace the single imaginary quantity i with two
imaginary quantities, say i and j. Unfortunately, this this idea does not work. After many failed
attempts, Hamilton finally came up with the idea of, rather than using two imaginaries, instead using
three imaginaries i, j, and k, which behave as follows:
i2 = j 2 = k 2 = ijk = −1 ij = k, jk = i, ki = j.
Combining these, it follows that ji = −k, kj = −i and ik = −j. The skew symmetry of multiplication
(e.g., ij = −ji) was actually a major leap, since multiplication systems up to that time had been
commutative.)
Hamilton defined a quaternion to be a generalized complex number of the form
q = q0 + q1 i + q2 j + q3 k.
q = (s, u) = s + ux i + uy j + uz k
p = (t, v) = t + vx i + vy j + vz k.
If we multiply these two together, we’ll get lots of cross-product terms, such as (ux i)(vy j), but we
can simplify these by using Hamilton’s rules. That is, (ux i)(vy j) = ux vh (ij) = ux vh k. If we do this,
simplify, and collect common terms, we get a very messy formula involving 16 different terms. (The
derivation is left as an exercise.) The formula can be expressed somewhat succinctly in the following
form:
q p = (st − (u · v), sv + tu + u × v).
Note that the above expression is in the quaternion scalar-vector form. The first term st−(u·v) evaluates
to a scalar (recalling that the dot product returns a scalar), and the second term (sv + tu + u × v) is a
sum of three vectors, and so is a vector. It can be shown that quaternion multiplication is associative,
but not commutative.
Quaternion Multiplication and 3-d Rotation: Before considering rotations, we first define a pure quater-
nion to be one with a 0 scalar component
p = (0, v).
(To see why this works, try multiplying qq−1 , and see what you get.) Observe that if q is a unit
quaternion, then it follows that q−1 = q∗ .
As you might have guessed, our objective will be to show that there is a relation between rotating
vectors and multiplying quaternions. In order apply this insight, we need to first show how to represent
rotations as quaternions and 3-dimensional vectors as quaternions. After a bit of experimentation, the
following does the trick:
Vector: Given a vector v = (vx , vy , vz ) to be rotated, we will represent it by the pure quaternion
(0, v).
Rotation: To represent a rotation by angle θ about a unit vector u, you might think, we’ll use the
scalar part to represent θ and the vector part to represent u. Unfortunately, this doesn’t quite
work. After a bit of experimentation, you will discover that the right way to encode this rotation
is with the quaternion q = (cos(θ/2), (sin(θ/2))u). (You might wonder, why we do we use θ/2,
rather than θ. The reason, as we shall see below, is that “this is what works.”)
Example: Consider the 3-d “roll” rotation shown in Fig. 19. This rotation can be achieved by performing
b = (0, 1, 0),
a rotation about the y-axis by θ =√90 degrees. Thus θ = π/2, and√the axis of rotation is u
and so we have s = cos(θ/2) = 1/ 2 and u = (sin(θ/2))b u = (0, 1/ 2, 0), and hence
π π
1 1
q = (cos(θ/2), (sin(θ/2))u) = cos , sin (0, 1, 0) = √ , 0, √ , 0 .
4 4 2 2
z z
v 90◦
x y x y
R(v)
Let us consider how the x-unit vector v = (1, 0, 0) is transformed under this rotation. To reduce this
to a quaternion operation, we encode v as a pure quaternion p = (0, v) = (0, (1, 0, 0)). Observe that
2 1 1 −1
s − (u · u) = − = 0, (u · v) = 0, and (u × v) = 0, 0, √ .
2 2 2
By applying the rotation operator, by Eq. (1), we have
Rq (p) = (0, (s2 − (u · u))v + 2u(u · v) + 2s(u × v))
√
= (0, 0v + 2u0 + 2s(0, 0, −1/ 2))
√ √
= (0, ~0 + ~0 + (2/ 2)(0, 0, −1/ 2))
= (0, (0, 0, −1)).
vk = (u · v)u v⊥ = v − vk = v − (u · v)u.
θ u Top view
v⊥
v w v⊥
vk θ
R(v)
R(v⊥)
w
(a) (b)
Note that vk is unaffected by the rotation, but v⊥ is rotated to a new position R(v⊥ ). To determine
this rotated position, we will first construct a vector that is orthogonal to v⊥ lying in the plane of
rotation.
w = u × v⊥ = u × (v − vk ) = (u × v) − (u × vk ) = u × v.
The last step follows from the fact that u and vk are parallel, and so the cross product is zero. Clearly
w is orthogonal to both v⊥ and u. Furthermore, because v⊥ is orthogonal to the unit vector u, it
follows from basic properties of the cross product that w is the same length as v⊥ .
Now, consider the plane spanned by v⊥ and w (see Fig. 20(b)). We have
In summary, we have the following formula expressing the effect of the rotation of vector v by angle θ
about a rotation axis u:
This expression is the image of v under the rotation. Notice that, unlike Euler angles, this is expressed
entirely in terms of intrinsic geometric functions (such as dot and cross product), which do not depend
on the choice of coordinate frame. This is a major advantage of this approach over Euler angles.
Observe that the vector part of this quaternion is identical to the angular displacement equation for
R(v) presented in Eq. (2), implying that the quaternion rotation operator achieves the desired rotation.
Composing Rotations: (Optional) We have shown that each unit quaternion corresponds to a rotation
in 3-space. This is an elegant representation, but can we manipulate rotations through quaternion
operations? The answer is yes. In particular, the action of multiplying two unit quaternions results in
another unit quaternion. Furthermore, the resulting product quaternion corresponds to the composition
of the two rotations. In particular, given two unit quaternions q and q0 , a rotation by q followed by a
rotation by q0 is equivalent to a single rotation by the product q00 = q0 q. That is,
This follows from the associativity of quaternion multiplication, and the fact that (qq0 )−1 = q−1 q0−1 ,
as shown below.
(a) (b)
First, define a vector ~v to be the vector from p to t, that is, ~v ← t − p. Next, define the vector ~u
to be a vector that is directed from p to q, thus, ~u ← q − p. Let us normalize
√ these vectors to unit
length, by defining u b ← normalize(~u) = ~u/`(u), where `(u) = kuk = u · u. (Here we have used
the property of dot product, that the dot product of a vector with itself is the vector’s squared
length.) We can do the same for ~v .
In order for q to lie within the cone, we compute the angle between these vectors. Recall, that
we can compute the cosine of two unit vectors by taking their dot product. Since the cosine is a
monotonically decreasing function (for the angles in the range from 0 to 180◦ ), this is equivalent
to the condition ub · ~v ≥ cos θ.
Be careful! Remember that θ is given in degrees and the cosine function assumes that the argument
is given in radians. To convert from degrees to radians we multiply by π/180. So the correct
expression is π
b · vb ≥ cos θ ·
u .
180
To solve (2), it suffices to test whether If `(u) > r then q is too far away to be hit.
To summarize, we have the following test.
~v ← t − p; ~u ← q − p
√ √
`(v) ← k~v k = ~v · ~v ; `(u) ← k~uk = ~u · ~u
vb ← normalize(~v ) = ~v /`(v); b ← normalize(~u) = ~u/`(u)
u
c1← u
b · vb
π
c2 ← cos θ ·
180
return true iff (c1 ≥ c2 and `(u) ≤ r).
A Unity implementation of this procedure (which I haven’t tested) can be found in the following
code block.
Projectile Shooting: Your game involves a shooting an object (and arrow, rock, grenade, or other
projectile) in a certain direction. Your boss wants you to write a program to determine where the
projectile will land, as part of an aiming tool for inexperienced players.
Suppose that the projectile is launched from a location that is h meters. Following Unity’s convention
the projectile starts above on the vertical (y) axis, at coordinates (0, h, 0). Suppose that the projectile
is launched with a velocity given by the vector ~v0 = (v0,x , v0,y , v0,z ). Let’s assume that the arrow is
shot upwards, that is, v0,y > 0. To simplify matters, let’s assume that the projectile is shot in the
forward (z) direction. Thus v0,x = 0 and v0,z > 0. We want to determine the distance ` from the
shooter where the projectile hits the ground.
Let t = 0 denote the time at which the object is shot. After consulting a standard textbook on Physics,
we are reminded that (on Earth at least) the force of gravity results in an acceleration of g ≈ 9.8m/s2 .
~v0
~v0 p = (px, h, pz )
h
y
h y z
x q
z
v0,z t∗
(a) (b)
After consulting your physics text, you find out that after t time units have elapsed, the position of
the projectile is p(t) = (z(t), y(t)), where
1
z(t) = v0,z t and y(t) = h + v0,y t − gt2 .
2
(Assuming no wind resistance, the projectile’s motion with the z-axis is constant over time as v0,z . It’s
motion with respect to the y-axis follows a downward parabolic arc as a function of time t.)
Time of Impact: Letting a = g/2, b = −v0,y , and c = −h, we seek the value of t such that at2 + bt +
c = 0. (We have intentionally negated the coefficients so that a > 0.) By the quadratic formula
we have q
√ v ± 2 + 2gh
v0,y
−b ± b − 4ac
2 0,y
t = = .
2a g
Note that the quantity under the square-root sign is positive and is larger than v0,y , which implies
that both roots exist, one is positive and one is negative. Clearly,
we want
q the positive root. Thus,
∗ 2
we take the “+” root from the “±” option, which yields t = v0,y + v0,y + 2gh /g.
Location of Impact: We know that the projectile moves at a rate of v0,z units per second horizon-
tally. Therefore, at time t = t∗ it has traveled a distance of v0,z t∗ units. Since we started at the
origin, the location we hit the ground is (x, y, z) = (0, 0, v0,z t∗ ).
Shooting and Arrow: Before leaving the topic of shooting the projectile, it is worth observing that the
Unity Physics engine can simulate the motion of the projectile. This will look fine if the projectile
is a ball. However, if the projectile is an arrow, the arrow will not “turn” properly in the air in the
manner that arrows do. Unity will simply translate the arrow through space, without rotating it (see
Fig. 23(a)). This raises the question, “How can we rotate the arrow to point in the direction it is
traveling?” (see Fig. 23(b))
y y
z z
(a) (b)
This may seem to be a complicated question. Clearly, this is not a rotation about the axes, and so Euler
angles would not be the right approach. We need to use quaternions. We want to specify a quaterion
that will cause the arrow to rotate in the direction it is moving. Luckily for us, Unity provides a
function that does almost what we need. The Unity function Quaternion.LookRotation(Vector3 forward)
generates a rotation (represented as a quaternion) that aligns the forward (z) axis with the argument.
Thus, to align the arrow with the direction it is heading, we can use the following Unity commands:
RigidBody rb = getComponent < RigidBody > () ;
transform . rotation = Quaternion . LookRotation ( rb . velocity ) ;
This will rotate the object so its orientation matches its velocity, as desired.
~u ~u
q w
~ q
p p w
d
~v ~v
~r
(a) (a)
Solution: First observe that w ~ ← q − p defines a vector that is directed from the space ship to the
obstacle (see Fig. 24(b)). To convert this into a unit vector (since we just care about the direction),
let us normalize it to unit length. (Recall that normalizing a vector involves dividing a vector by
its length.) We can compute the length of a vector as the square root of it dot product with itself.
Thus, we have
w
~ ← q−p
w
~ w~ w
~
b
w ← normalize(w)
~ = = √ = q .
kwk
~ w~ ·w
~ wx2 + wy2 + wz2
What we want to know is whether this vector is pointing to the space-ship pilot’s left or right (in
which case we will turn the opposite direction), or is above or below (in which case we will pitch
in the opposite direction).
Let’s first tackle the problem of whether to turn pitch up or down. We can determine this by
checking whether the angle between the up-vector ~u and w. b If this angle is smaller than 90◦ ,
then the obstacle is above us and we should pitch downward. Otherwise, we should pitch upward.
Given that both vectors have unit length, we can compute the cosine of the angle between them
by the dot product. If the dot product is positive, the angle is smaller than 90◦ , thus the obstacle
is above, and we turn down. Otherwise, we turn down. We have
b · ~u ≥ 0
w ⇒ (obstacle above) pitch downwards
b · ~u < 0
w ⇒ (obstacle below) pitch upwards.
(By the way, since we are only checking the sign of this dot product, not its magnitude, it was not
really necessary to normalize w~ to unit length. We could have substituted w b above without
~ for w
affecting the correctness of the result.)
Next, let’s consider whether to turn left or right. We would like to perform a similar type of
computation, but to do so, we should generate a vector that indicates left and right relative to
~r ← ~v × ~u
b · ~r ≥ 0
w ⇒ (obstacle to the right) yaw to the left
b · ~r < 0
w ⇒ (obstacle to the left) yaw to the right.
A Unity implementation of this procedure (which I haven’t tested) can be found in the following
code block.
Turning a ship at position p to avoid obstacle at q
void Evade ( Vector3 p , Vector3 v , Vector3 u , Vector3 q ) {
Vector3 w = q - p ; // vector from pilot to obstacle
float l = w . magnitude ; // distance to obstacle
Vector3 ww = w . normalized ; // directional vector to obstacle
if ( Vector3 . Dot ( ww , u ) >= 0) // obstacle is above ?
PitchDown () ;
else
PitchUp () ;
Vector3 r = Vector3 . Cross (v , u ) ; // vector to pilot ’ s right
if ( Vector3 . Dot ( ww , r ) >= 0) // obstacle is to the right
YawToLeft () ;
else
YawToRight () ;
}
We have not discussed how to perform the pitch or yaw operations. In Unity, these could be expressed
as rotations about the vectors ~r and ~u, respectively.
p+ b
r k=8
p u2
p+
p− u1
a
r
p−
(a) (b) (c) (d) (e)
Fig. 25: Examples of common enclosures: (a) AABB, (b) general BB, (c) sphere, (d) capsule, (e) 8-DOP.
General bounding boxes: The principal shortcoming of axis-parallel bounding boxes is that it is not
possible to rotate the object without recomputing the entire bounding box. In contrast, general
(arbitrarily-oriented) bounding boxes can be rotated without the need to recompute them (see
Fig. 25(b)).
A natural approach to represent such a box is to describe the box as an AABB but relative to
a different coordinate frame. For example, we could define a frame whose origin is one of the
corners of the box and whose axes are aligned with the box’s sides. By applying an appropriate
affine transformation, we can map the general box to an AABB.
Computing the minimum bounding box is not simple. It will be the AABB for an appropriate
rotation of the body, but determining the best rotation (especially in 3-space) is quite tricky.
Bounding spheres: These are among the most popular bounding enclosures. A sphere can be rep-
resented by a center point p and a radius r (see Fig. 25(b)). Spheres are invariant under rigid
transformations, that is under translation and rotation. Unfortunately, they are not well suited
to skinny objects.
Minimum bounding spheres are tricky to compute exactly. A commonly used heuristic is to first
identify (by some means) a point p that lies near the center of the body, and then set the radius
just large enough so that a sphere centered at p encloses the body. Identifying the point p is
tricky. One heuristic is set p to be the center of gravity of the body. Another is to compute two
points a and b on the body that are farthest apart from each other. This is called the diametrical
pair. Define p to be the midpoint of the segment ab.
Bounding ellipsoids: The main problem with spheres (and problem that also exists with axis-parallel
bounding boxes) is that skinny objects are not well approximated by a sphere. A ellipse (or gener-
ally, an ellipsoid in higher dimensions) is just the image of a sphere under an affine transformations.
Detecting Collisions: By enclosing an object within a bounding enclosure, collision detection reduces to
determining whether two such enclosures intersect each other. Note that if we support k different
types of enclosure, we need to handle all possible pairs of combinations of collisions. Here are a few
examples:
AABB-AABB: We can test whether two axis-aligned bounding boxes overlap by testing that all
pairs of intervals overlap. For example, suppose that we have two boxes b and b0 , where the box
b extends from the lower-left corner (x1 , y1 ) to the upper right corner (x2 , y2 ) and b0 extends
from the lower-left corner (x01 , y10 ) to the upper right corner (x02 , y20 ) (see Fig. 26(a)). These boxes
overlap if and only if
y20 b
b0
y2 b b0 rp p0(s)
b
a0 p(t)
r0
y10
p0
y1 a
x1 x01 x2 x02
(a) (b) (c) (d)
Box-Box: Determining whether two arbitrarily oriented boxes b and b0 intersect is a nontrivial task.
If they do intersect, one of the following must happen (see Fig. 26(b)):
• A vertex of b lies within b0 or vice versa.
Hierarchies of bounding bodies: What if the above bounding volumes are not sufficiently accurate for
your purposes? A natural generalization is that of constructing a hierarchy consisting of multiple levels
of bounding bodies, where the bodies at a given level enclose a constant number of bodies at the next
lower level.
If you consider the simplest case of axis-aligned bounding boxes, the resulting data structure is called
an R-tree. In Fig. 27 we given an example, where the input boxes are shown in (a), the hierarchy
(allowing between 2–3 boxes per group) is shown in (b) and the final tree is shown in (c).
a a
d e d e
c c
b b a b c d e
Fig. 27: A hierarchy of bounding boxes: (a) the input boxes, (b) the hierarchy of boxes, (c) the associated
R-tree structure.
There are a number of interesting (and often quite complicated) technical issues in the design of R-
trees. For example: What is the best way to cluster smaller boxes together to form larger boxes?
How do you minimize the wasted space within each box? How do you minimize the overlap between
∆ ∆ ∆
q q
i p i p
j j
(a) (b) (c)
Computing the indices of the grid cell that contain a given point is a simple exercise in integer arith-
metic. For example, if p = (px , py ), then let
jp k jp k
x y
j = and i = .
∆ ∆
Then, the point p lies within the grid cell G[i, j].
If the diameter of most of the objects is not significantly larger than ∆, then each object will only be
associated with a constant number of grid cells. If the density of objects is not too high, then each
grid square will only need to store a constant number of pointers. Thus, if the above assumptions are
satisfied then the data structure will be relatively space efficient.
Storing a Grid: As we have seen, a grid consists of a collection of cells where each cell stores a set of
pointers to the objects that overlap this cell (or at least might overlap this cell). How do we store these
cells? Here are a few ideas.
d-dimensional array: The simplest implementation is to allocate a d-dimensional array that is suffi-
ciently large to handle all the cells of your grid. If the distribution of objects is relatively uniform,
Fig. 29: Linear allocations to improve reference locality of neighboring cells: (a) row-major order, (b) the
Hilbert curve, (c) the Morton order (or Z-order).
There is experimental evidence that shows that altering the physical allocation of cells can improve
running times moderately. Unfortunately, the code that maps an index (i, j) to the corresponding
address in physical memory becomes something of a brain teaser.
Computing the Morton Order: Between the Hilbert order and the Morton order, the Morton order
is by far the more commonly use. One reason for this is that there are some nifty tricks for
computing the this order. To make this easier to see, let us assume that we are working in
two-dimensional space and that the grid of size 2m × 2m . The trick we will show applies to any
dimension. If your grid is not of this size, you can embed it within the smallest grid that has this
property.
k = hi1 , j1 , i2 , j2 , . . . , im , jm i2 .
If you have not seen this trick before, it is rather remarkable that it works. As an example,
consider the cell at index (i, j) = (2, 3), which is labeled as 13 in Fig. 29(c). Expressing i and j
as 3-element bit vectors we have i = h0, 1, 0i2 and j = h0, 1, 1i2 . Next, we interleave these bits to
obtain
k = h0, 0, 1, 1, 0, 1i2 = 13,
just as we expected.
This may seem like a lot of bit manipulation, particularly if m is large. It is possible, however, to
speed this up. For example, rather than processing one bit at a time, we could break i and j up
into 8-bit bytes, and then for each byte, we could access a 256-element look-up table to convert
its bit representation to one where the bits have been “spread out.” (For example, suppose that
you have the 8-element bit vector hb0 , b1 , . . . , b7 i2 . The table look-up would return the 16-element
bit vector hb0 , 0, b1 , 0, . . . , b7 , 0i2 .) You repeat this for each byte, applying a 16-bit shift in each
case. Finally, you apply an addition right shift of the j bit vector by a single position and bitwise
“or” the two spread-out bit vectors for i and j together to obtain the final shuffled bit vector. By
interpreting this bit vector as an integer we obtain the desired Morton code for the pair (i, j).
Quadtrees: Grids are fine if the density of objects is fairly regular. If there is considerable variation in the
density, a quadtree is a practical alternative. You have probably seen quadtrees in your data structures
course, so I’ll just summarize the main points, and point to a few useful tips.
First off, the term “quadtree” is officially reserved for 2-dimensional space and “octree” for three
dimensional space. However, it is too hard to figure out what the name should be when you get to
13-dimensional space, so I will just use the term “d-dimensional quadtree” for all dimensions.
NW NE
SW SE
SW NW SE NE
(a) (b) (c)
We begin by assuming that the domain of interest has been enclosed within a large bounding square
(or generally a hypercube in d-dimensional space). Let’s call this Q0 . Let us suppose that we have
applied a uniform scaling factor so that Q0 is mapped to the d-dimensional unit hypercube [0, 1]d . A
quadtree box is defined recursively as follows:
• Q0 is a quadtree box
Binary Quadtrees: In dimension 3 and higher, having to allocate 2d children for every internal node
can be quite wasteful. Unless the points are uniformly distributed, it is often the case that only
a couple of these nodes contain points. An alternative is rely only on binary splits. First, split
along the midpoint x-coordinate, then the midpoint y-coordinate, and so forth, cycling through
the axes (see Fig. 31).
(a) (b)
Linear Quadtree: A very clever and succinct method for storing quadtrees for point sets involves no
tree at all! Recall the Morton order, described earlier in this lecture. A point (x, y) is mapped
to a point in a 1-dimensional space by shuffling the bits of x and y together. This maps all the
points of your set onto a space filling curve.
What does this curve have to do with quadtrees? It turns out that the curve visits the cells of the
quadtree (either the standard, binary, or compressed versions) according to an in-order traversal
of the tree (see Fig. 32).
How can you exploit this fact? It seems almost unbelievable that this would work, but you sort all
the points of your set by the Morton order and store them in an array (or any 1-dimensional data
structure). While this would seem to provide very little useful structure, it is remarkable that
many of the things that can be computed efficiently using a quadtree can (with some additional
modifications) be computed directly from this sorted list. Indeed, the sorted list can be viewed
as a highly compressed encoding of the quadtree.
The advantage of this representation is that it requires zero additional storage, just the points
themselves. Even though the access algorithms are a bit more complicated and run a bit more
slowly, this is a very good representation to use when dealing with very large data sets.
Kd-trees: While quadtrees are widely used, there are some applications where more flexibility is desired in
how the object space is partitioned. There are a number of alternative index structures that are based
on the hierarchically subdividing space into simple regions. Data structures based on such hierarchical
subdivisions are often called partition trees. One of the most widely-used partition-tree structures is
the kd-tree.
A kd-tree 3 is a partition tree based on orthogonal slicing. We start by assuming that all the points of
our space are stored within some large bounding (axis-aligned) rectangle, which is associated with the
root node of the tree. Every node of the tree is associated with a (hyper)-rectangular region, called its
cell. Each internal node of the tree is associated with an axis-aligned splitting plane, which is used to
split the cell in two. The points falling on one side are stored in one child and points on the other side
are stored in the other. Each internal node t of the kd-tree is associated with the following quantities:
Of course, there generally may be additional information associated with each node (for example, the
number of objects lying within the node’s cell), depending on the exact application. If the cutting
dimension is i, then all points whose ith coordinate is less than or equal to t.cut-val are stored in
the left subtree, and the remaining points are stored in the right subtree (see Fig. 33). (If a point’s
coordinate is equal to the cutting value, then we may allow the point to be stored on either side.)
When a single point remains, we store it in a leaf node, whose only field t.point is this point.
There are two key decisions in the implementation of the kd-tree.
How is the cutting dimension chosen? The simplest method is to cycle through the dimensions
one by one. (This method is shown in Fig. 33.) Since the cutting dimension depends only on the
level of a node in the tree, one advantage of this rule is that the cutting dimension need not be
stored explicitly in each node, instead we keep track of it while traversing the tree.
One disadvantage of this splitting rule is that, depending on the data distribution, this simple
cyclic rule may produce very skinny (elongated) cells, and such cells may adversely affect query
times. Another method is to select the cutting dimension to be the one along which the points
have the greatest spread, defined to be the difference between the largest and smallest coordinates.
Bentley call the resulting tree an optimized kd-tree.
3 The terminology surrounding kd-trees has some history. The data structure was proposed originally by Jon Bentley. In
his notation, these were called “k-d trees,” short for “k-dimensional trees” since the generalize classical binary trees for 1-
dimensional data. Thus, there are 2-d trees, 3-d trees, and so on. However, over time, the specific value of k was lost, and they
are simply called kd-trees.
How is the cutting value chosen? To guarantee that the tree is balanced, that is, it has height
O(log n), the best method is to let the cutting value be the median coordinate value along the
cutting dimension. In our example, when there are an odd number of points, the median is
associated with the left (or lower) subtree.
Note that a kd-tree is a special case of a more general class of hierarchical spatial subdivisions, called
binary space partition trees (or BSP trees) in which the splitting lines (or hyperplanes in general) may
be oriented in any direction, not just axis-aligned.
Fig. 34: (a) and (b) skeletal model and (c) the bind (or reference) pose.
Bind Pose: Before discussing animation on skeletal structures, it is useful to first say a bit about the
notion of a pose. In humanoid and animal skeletons, joints move by means of rotations4 (as opposed
say to translation, which arises with some robots). Assigning angles to the various joints of a skeleton
uniquely specifies the skeleton’s exact geometric structure, called its pose.
When a designer defines the initial layout of the model’s skin, the designer does so relative to a default
pose, which is called the reference pose or the bind pose.5 For human skeletons, the bind pose is
typically one where the character is standing upright with arms extended straight out to the left and
right (similar to Fig. 34(b) above).
Joint Internal Information: Each joint can be thought of as defining its own joint coordinate frame (see
Fig. 35(a)). Recall that in affine geometry, a coordinate frame consists of a point (the origin of the
frame) and three mutually orthogonal unit vectors (the x, y, and z axes of the frame). Given the
skeleton’s inverted tree structure (see Fig. 35(b)), rotating a joint can be achieved by applying a
suitable rotation transformation to its associated coordinate frame. Each frame of the hierarchy is
understood to be positioned relative to its parent’s frame. In this way, when the shoulder joint is
rotated, the descendants’ joints (elbow, hand, fingers, etc.) also move as a result (see Fig. 35(c)).
Change-of-Coordinates Transformation: In order to determine the motion of the various bones that
result from some joint rotation, we need to know the relationships between the various joints of the
skeleton. There is a very general and elegant way of doing this through the application of affine
geometry. Given any two coordinate frames in d-dimensional space, it is possible to convert a point (or
free vector) represented in one coordinate frame to its representation in the other frame by multiplying
the point (given as a (d+1)-dimensional vector in homogeneous coordinates) times an suitable (d+1)×
(d + 1) matrix. The resulting affine transformation is called a change-of-coordinates transformation.
Constructing such transformations is an exercise in linear algebra. For the sake of completeness, let us
consider the process in a simple 2-dimensional example. Suppose we have two coordinate frames, F
4 It is rather interesting to think about how this happens for your own joints. For example, your shoulder joint has two
degrees of freedom, since it can point your upper arm in any direction it likes. Your elbow also has two degrees of freedom.
One degree comes by flexing and extending your forearm. The other can be seen when you turn your wrist, as in turning a
door knob. Your neck has (at least) three degrees of freedom, since, like your shoulder, you can point the top of your head in
any direction, and, like your elbow, you can also turn it clockwise and counterclockwise.
5 I suspect that the name “bind pose” arises because designers attach or “bind” the skin to the model relative to this initial
pose.
Fig. 35: (a) Skeletal model, (b) inverted tree structure, and (c) rotating a frame propagates to the descen-
dants.
and G (see Fig. 36). Let F.o, F.x, and F.y denote F ’s origin point, and its two basis vectors. Define
G.o, G.x and G.y similarly.
G.x[F ] = (2, 1, 0)
y y F.x[G] = 25 , − 15 , 0
p
p x G.y [F ] = (−1, 2, 0) x
o F.y [G] = 15 , 25 , 0
G o G.o[F ] = (4, 2, 1) G
y y F.o[G] = (−2, 0, 1)
F o x F o x
Given any point in space, it can be represented either with respect to F ’s coordinate system or G’s.
For any point p, define p[F ] to be p’s homogeneous coordinates relative to frame F , and define p[G]
similarly for frame G. We can do the same for any vector ~v .
In order to define the change-of-coordinates transformation, we need to know first what G’s basis
elements are relative to F . In the above example, it is easy to verify that
G.x[F ] = (2, 1, 0), G.y [F ] = (−1, 2, 0), and G.o[F ] = (4, 2, 1).
(Recall that we are using affine homogeneous coordinates, where the last component is 0 to denote a
vector or 1 to denote a point.) Also, it is easy to verify that
2 1 1 2
F.x[G] = , − , 0 , F.y [G] = , , 0 , and F.o[G] = (−2, 0, 1).
5 5 5 5
Clearly, p[F ] = (1, 3, 1) and p[G] = (−1, 1, 1). Applying the above transformations, we obtain the
expected results
2 −1 4 −1 1
T[F ←G] · p[G] = 1 2 2 1 = 3 = p[F ] ,
0 0 1 1 1
and
2/5 1/5 −2 1 −1
T[G←F ] · p[F ] = −1/5 2/5 0 3 = 1 = p[G] ,
0 0 1 1 1
Next, consider ~v . We have ~v[F ] = (3, −1, 0) and ~v[G] = (1, −1, 0). Again, applying the above transfor-
mations, we have
2 −1 4 1 3
T[F ←G] · ~v[G] = 1 2 2 −1 = −1 = ~v[F ] ,
0 0 1 0 0
and
2/5 1/5 −2 3 1
T[G←F ] · ~v[F ] = −1/5 2/5 0 −1 = −1 = ~v[G] .
0 0 1 0 0
Abstraction Revisited: We have seen the use of homogeneous matrices before for the purpose of per-
forming affine transformations (that is, for moving objects around in space). We are using the same
mechanism here, but the meaning is quite different. Here the objects are not being moved, rather we
are simply translating the names of points and vectors from one coordinate system to another. The
geometric objects are themselves not moving.
You might wonder, “What’s the difference in how you look at it?” Recall that in affine geometry
we defined points and (free) vectors as different abstract objects, that employ the same representation
y v[i] = (3, 6, 1)
x
i x y j
(a) (b)
Fig. 38: Three joints i, j, and k in a (rather nonstandard) bind pose. The point v is represented in
homogeneous coordinates relative each frame.
Consider three joints i, j, and k, where i = p(j) and j = p(k) (see Fig. 38(a)). The local-pose
transformation for k, T[p(k)←k] , can be expressed more succinctly as T[j←k] . Given a point v[k] expressed
relative to k’s frame, we can express it relative to j’s frame as
Similarly, a point v[j] expressed relative to j’s frame can be expressed relative to i’s frame as
Combining these, we can express a point in k’s frame relative to i’s frame by taking the product of
these two matrices
v[i] = T[i←j] · T[j←k] · v[k] = T[i←k] · v[k] ,
where T[i←k] = T[i←j] · T[j←k] . Clearly, by multiplying appropriate chains of the local-pose transforma-
tions and their inverses, we can walk up and down the paths of the tree allowing us to convert a point
relative to any one joint into its representation relative to any other joint.
We can apply our knowledge of rotation transformations and the local-pose transformations and their
inverses to solve this problem. For example, recall the point v in our earlier example (see Fig. 39(a))
Suppose that the elbow is rotated by counterclockwise by 30◦ (see Fig. 39(b)), and then the shoulder
is rotated clockwise by 45◦ , that is, counterclockwise by −45◦ (see Fig. 39(c)). The question that we
want to consider, where is the point v mapped to as a result of these two rotations? Let v 0 be its
position after the elbow rotation and let v 00 be its position after both rotations.
Before getting to the answer, recall from our earlier lecture on affine geometry the rotation transfor-
mations in homogeneous coordinates:
√
cos 30◦ − sin 30◦ 0 3/2 √ −1/2 0
Rot(30◦ ) = sin 30◦ cos 30◦ 0 = 1/2 3/2 0 .
0 0 1 0 0 1
and √ √
cos 45◦ − sin 45◦ 0 1/√2 −1/√2 0
Rot(−45 ) = sin 45◦
◦
cos 45◦ 0 = 1/ 2 1/ 2 0 .
0 0 1 0 0 1
We need to decide which frame to use as our reference frame. Let’s use the shoulder joint i, since
it is the most “global”. (Generally, we would select the root of our skeleton tree.) We saw already
how to compute v[i] . Using this as a starting point, let’s first consider the effect of the elbow rotation.
Because the elbow rotation occurs about the elbow’s coordinate frame, we first need to translate v into
its representation with respect to j’s frame (by multiplying by the inverse local-pose transformation
T[j←i] ). We then apply the 30◦ rotation about the elbow joint. Finally, we convert this representation
back to the shoulder frame (by applying the local-pose transformation T[i←j] ). Thus, we have
0
v[i] = T[i←j] · Rot(30◦ ) · T[j←i] · v[i] .
Top-down or Bottom-up? You might wonder why we did the elbow rotation first followed by the shoulder
transformation. Does the order really matter? The issue is that our local-pose transformations have
been built under the assumption that the model is in the bind pose, that is, none of the joints are
rotated. If we were to have performed the shoulder rotation first, and then attempted to apply the
inverse local-pose transformation T[j←i] to convert the result from the shoulder’s frame to the elbow
frame, we would discover that this transformation is no longer correct. The reason is that the entire
arm (and the elbow joint in particular) has moved into a new position, but T[j←i] was defined based
on its original position. To avoid this problem, the transformations should be applied in a bottom-up
manner, first rotating the descendant nodes (e.g., wrist) and then working up to their ancestors (e.g.,
elbow and then shoulder).
Take-Away Lesson: I must acknowledge that implementing this by by hand would be a mess (especially
in 3-space), but hopefully you get the idea. By using our local-pose transformations (and possibly
their inverses), we can change to the coordinate frame where the rotation takes place, then apply the
rotation, then translate back. While it would be messy to write down all the transformations, if we
have precomputed the local pose transformations and their inverses, this can all be programmed in a
straightforward manner by traversing the tree (in postorder) and performing simple matrix multipli-
cations.
(a) (b)
Fig. 40: Using two meta-joints (b) to simulate a single joint with two degrees of freedom (a).
Animating the Model: There are a number of ways to obtain joint angles for an animation. Here are a
few:
Motion Capture: For the common motion of humans and animals, the easiest way to obtain anima-
tion data is to capture the motion from a subject. Markers are placed on a subject, who is then
asked to perform certain actions (walking, running, jumping, etc.) By tracking the markers using
multiple cameras or other technologies, it is possible to reconstruct the positions of the joints.
From these, it is simple exercise in linear algebra to determine the joint angles that gave rise to
these motions.
Motion capture has the advantage of producing natural motions. Of course, it might be difficult
to apply for fictitious creatures, such as flying dragons.
Key-frame Generated: A design artist can use animation modeling software to specify the joint
angles. This is usually done by a process called key framing, where the artists gives a detailed
layout of the model at certain “key” instances in over the course of the animation, called key
frames. (For example, when animating a football kicker, the artist might include the moment
when the leg starts to swing forward, an intermediate point in the swing, and the point at which
Representing Animation Clips: In order to specify an animation, we need to specify how the joint angles
or generally the joint frames vary with time. This can result in a huge amount of data. Each joint that
can be independently rotated defines a degree of freedom in the specification of the pose. For example,
the human body has over 200 degrees of freedom! (It’s amazing to think that our brain can control
it all!) Of course, this counts lots of fine motion that would not normally be part of an animation,
but even a crude modeling of just arms (not including fingers), legs (not including toes), torso, neck
involves over 20 degrees of freedom.
As with any digital signal processing (such as image, audio, and video processing), the standard
approach for efficiently representing animation data is to first sample the data at sufficiently small
time intervals. Then, use some form of interpolation technique to produce a smooth reconstruction
of the animation. The simplest manner to interpolate values is based on linear interpolation. It
may be desireable to produce smoother results by applying more sophisticated interpolations, such as
quadratic or cubic spline interpolations. When dealing with rotated vector quantities, it is common to
use spherical interpolation.
In Fig. 41 we give a graphical presentation of a animation clip. Let us consider a fairly general set
up, in which each pose transformation (either local or global, depending on what your system prefers)
is represented by a 3-element translation vector (x, y, z) indicating the joint frame’s position and a 4-
element quaternion vector (s, t, u, v) to represent the frame’s rotation. Each row of this representation
is a sequence of scalar values, and is called a channel.
Time samples
0 1 2 3 4 5 6 7 Linear interpolation
x
T0 y
z
Joint 0 s
T0,x
Q0 ut
v
x
T1 y Time Frame rate
z
Joint 1 s
Q1 ut
v
Camera motion
Meta channels
Event triggers
Left footstep Right footstep
Event triggers: These are discrete signals sent to other parts of the game system. For example, you
might want a certain sound playback to start with a particular event (e.g., footstep sound), a
display event (e.g., starting a particle system that shows a cloud of dust rising from the footstep),
or you may want to trigger a game event (e.g., a non-playing character ducks to avoid a punch).
Continuous information: You may want some process to adjust smoothly as a result of the ani-
mation. An example would be having the camera motion being coordinated with the animation.
Another example would be parameters that continuously modify the texture coordinates or light-
ing properties of the object. Unlike event triggers, such actions should be smoothly interpolated.
This auxiliary information can be encoded in additional streams, called meta-channels (see Fig. 41).
This information will be interpreted by the game engine.
Skinning and Vertex Binding: Now that we know how to specify the movement of the skeleton over
time, let us consider how to animate the skin that will constitute the drawing of the character. The
first question is how to represent this skin. The most convenient representation from a designer’s
perspective, and the one that we will use, is to position the skeleton in the reference pose and draw
the skin around the resulting structure (see Fig. 42(a)).
skin joint
overlap
bone
crack
(a) (b)
Fig. 42: (a) Binding skin to a skeletal model in the reference pose and (b) cracks and overlaps.
In order that the skin move smoothly along with the skeleton, we need to associate, or bind, vertices
of the mesh to joints of the system, so that when the joints move, the skin moves as well. (This is the
reason that the reference pose is called the bind pose.)
If we were to bind each vertex to a single joint, then we would observe cracks and overlaps appearing
in our skin whenever neighboring vertices are bound to two different joints that are rotated apart from
one another.
Dealing with this problem in a realistic manner will be too difficult. (The manner in which the tissues
under your skin deform is a complex anatomical process. Including clothing on top of this makes for a
tricky problem in physics as well.) Instead, our approach will be to find a heuristic solution that will
be easy to compute and (hopefully) will produce fairly realistic results.
v v 0 = 34 v10 + 14 v20
shoulder elbow
(a) (b) (c) (d)
Now, suppose that we bend both of these joints. Let v10 and v20 denote the respective images of the points
v1 and v2 after the rotation. (They will be in the same position relative to their respective joints, but
their absolute positions in space have changed. See Fig. 43(b).) We use our weight factors to interpolate
between these two positions, so the final position of the vertex is at the point v 0 = 34 v10 + 14 v20 (see
Fig. 43(c)). Because of the smooth variations in weights, the vertices of the mesh will form a smooth
interpolation between the upper arm and the lower arm (see Fig. 43(d)). It may not be physically
realistic, but it is a reasonable approximation, and is very easy to compute.
To make this more formal, we assume that each vertex of the mesh is associated with:
The number of joints to which a typical vertex is bound is typically small, e.g., from two to four. Good
solid modelers provide tools to automatically assign weights to vertices, and designers can query and
adjust these weights until they produce the desired look.
The above binding information can be incorporated into the mesh information already associated with
a vertex: the (x, y, z)-coordinates of its location (with respect to the model coordinate system), the
(x, y, z)-coordinates of its normal vector (for lighting and shading computation), and its (s, t) texture
coordinates.
Moving the Joints: In order to derive the computations needed to move a vertex from its initial position
to its final position, let’s start by introducing some notation. First, recall that our animation system
informs us at any time t the current angle for any joint. Abstractly, we can think of this joint angle
(t)
as providing a local rotation, Rj , that specifies how joint j has rotated. For example, if the joint
(t)
has undergone a rotation through an angle θ about some axis, then Rj would be represented by a
To obtain the position of a vertex associated with j’s coordinate frame, we need only compose these
matrices in a chain working back up to the root of the tree. We apply a rotation, convert to the
coordinate frame of the parent joint, apply the parent rotation, convert to the coordinates of the
grandparent joint, and so on. Suppose that the path from j to the root is j = j1 → j2 → . . . → jm = M ,
then transformation we desire is
m−1
Y
(t) (t) (t) (t) (t)
T[M ←j] = T[jm−i+1 ←jm−i ] = T[jm ←jm−1 ] . . . T[j3 ←j2 ] T[j2 ←j1 ]
i=1
(t) (t) (t)
= T[jm ←jm−1 ] Rjm−1 . . . T[j3 ←j2 ] Rj2 T[j2 ←j1 ] Rj1 .
We refer to this as the current-pose transformation, since it tells where joint j is at time t relative to
the model’s global coordinate system. Observe that with each animation time step, all the matrices
(t) (t)
Rj change, and therefore we need to perform a full traversal of the skeletal tree to compute T[M ←j]
for all joints j. Fortunately, a typical skeleton has perhaps tens of joints, and so this does not represent
a significant computational burden (in contrast to operations that need to be performed on each of the
individual vertices of a skeletal mesh).
Putting it all Together: Finally, let’s consider how to apply blended skinning together with the dynamic
pose transformations. This will tell us where every vertex of the mesh is mapped to in the currrent
(t)
animation. We assume that for the current-pose transformation T[M ←j] has been computed for all
the joints, and we assume that each vertex v is associated with a list of joints and associated weights.
Let J(v) = {j1 , . . . , jk } be the joints associated with vertex v, and let W (v) = {w1 , . . . , wk } be the
associated weights. Typically, k is a small number, ranging say from 1 to 4. For i running from 1 to
k, our approach will be compute the coordinates of v relative to joint ji , then apply the current-pose
(t) (t)
Let us define Kj = T[M ←j] T[j←M ] . This is called the skinning transformation for joint j. Intuitively,
it tells us where vertex v is mapped to under at time t of the animation assuming that it is fixed to
joint j.
Now, we can generalize this to the case of blending among a collection of vertices. Recall that v has
been bound to the joints of J(v). Its blended position at time t is given by the weighted sum of the
(t) (0)
image of the skinning transformed vertices Kji vM for each joint ji to which v is bound:
(t)
X (t) (0)
vM = wi Kji vM .
ji ∈J(v)
This then is the final answer that we seek. While it looks like a lot of matrix computation, remember
each vertex is associated with a constant number of joints, and each joint is typically at constant depth
in the skeletal tree. Once these matrices are computed, they may be stored and reused for all the
vertices of the mesh.
A simple example of this is shown in Fig. 44. In Fig. 44(a) we show the reference pose. In Fig. 44(b), we
show what might happen if every vertex is bound to a single joint. When the joint flexes, the vertices
at the boundary between to bones crack apart from each other. In Fig. 44(c) we have made a very
small change. The vertices lying on the seam between the two pieces have been bound to both joints
j1 and j2 , each with a weight of 1/2. Each of these vertices is effectively now placed at the midpoint
of the two “cracked” copies. The result is not very smooth, but it could be made much smoother by
adding weights to the neighboring vertices as well.
Reference pose Each vertex bound to one joint Vertices bound to both joints
j1 j2
j2
j1 j2 j1
It is worth making a few observations at this point about the storage/computational requirements of
this approach.
Matrix palette: In order to blend every vertex of the model, we need only one matrix for each joint
(t)
of the skeleton, namely the skinning matrices Kj . While a skeleton may have perhaps tens of
From the perspective of GPU implementation, this representation is very efficient. In particular, we
need only associate a small number of scalar values with each vertex (of which there are many), and
we store a single vertex with each joint (or which there are relatively few). In spite of the apparent
tree structure of the skeleton, everything here can be represented using just simple arrays. Modern
GPUs provide support for storing matrix palettes and performing this type of blending.
Shortcomings of Blended Skinning: While the aforementioned technique is particularly well suited to
efficient GPU implementation, it is not without its shortcomings. In particular, if joints are subjected
to high rotations, either in flexing or in twisting, the effect can be to cause the skin to deform in
particular unnatural looking ways (see Fig. 45).
(a) (b)
Fig. 45: Shortcomings of vertex blending in skinning: (a) Collapsing due to bending and (b) collapsing due
to twisting.
• Planning the coordinated motion of a group of agents who wish to move to a specified location
amidst many obstacles
• Planning the motion of an articulated skeletal model subject to constraints, such as maintaining
hand contact with a door handle or avoiding collisions while passing through through a narrow
passageway
• Planning the motion of a character while navigating through a dense crowd of other moving people
(who have their own destinations), or planning motion either to evade or to pursue the player
• Planning ad hoc motions, like that of a mountain climber jumping over boulders or climbing up
the side of a cliff
Historically, much of the initial development of techniques in this area arose from other fields, such
as robotics, autonomous vehicle navigation, and computational geometry. Game designers have some
advantages in solving these problems, since the environment in which the NPCs move is under the
control of the game designer. This means that a game designer can simplify motion planning by
creating additional free space in the environment, thus making it easier to plan motion. (In contrast,
Single-object motion:
From objects to points: Methods such as configuration spaces can be applied to reduce the
problem of moving a complex object (or assembly of objects) with multiple degrees of freedom
(DOFs) to the motion of a single point through a multi-dimensional space.
Discretization: Methods such as waypoints, roadmaps, and navigation meshes are used to reduce
the problem of moving a point in continuous space to computing a path in discrete graph-like
structure.
Shortest paths: This includes efficient algorithms for computing shortest paths, updating them
as conditions change, and representing them for later access.
Multiple-object motion:
Flocking: There exist methods for planning basic flocking behavior (as with birds and herding
animals) and applications to simulating crowd motion.
Purposeful crowd motion: Techniques such as velocity obstacles are used for navigating a
single agent from an initial start point to a desired goal point through an environment of
moving agents.
Guarding and Pursuit/Evasion: These include methods for solving motion-planning tasks
where one agent is either hunting for or attempting to elude detection by the player or
another agent.
Like many of the topics we have covered this semester, we could easily devote an entire course to this
one topic, but we will instead try to sample some of the key ideas. In this lecture, we will focus on
one of the most widely used concepts from this area, called a navigation mesh. This is the principal
support feature that the Unity Engine provides for character navigation.
Navigation Meshes: A navigation mesh (or NavMesh) is a data structure used to model free-space, partic-
ularly for an agent that is moving along a two-dimensional surface. (Such a surface is formally referred
to as a two-manifold ). A navigation mesh is a spatial subdivision (more specifically, a cell-complex)
whose faces are convex polygons, usually triangles. Each face of the mesh behaves like a node in a
graph, and two nodes are joined by an edge if the associated faces share a common edge. Because the
faces are convex, any point from inside one face can be reached by a straight line from any other point
inside the same face. As with a waypoint system, there is an underlying graph which can be used for
computing paths, but by storing the cell complex, the paths computed are not constrained to follow
the waypoints.
For example, in Fig. 46(a) we show a possible workspace. In (b) we show a possible waypoint system,
and in (c) we show a possible navigation mesh. We show a possible path using each representation
between a start point s and destination t.
Because they provide a more faithful representation of the underlying free-space geometry, navigation
meshes have a number of advantages over waypoint-based methods:
• They are capable of generating shorter and more natural paths than traditional waypoint methods.
s s
s
Fig. 46: (a) An environment, (b) a possible waypoint-based roadmap, and (c) a possible navigation map.
• Waypoint methods can generate an excessive number of points to compensate for their shortcom-
ings, and so navigation meshes can be considerably more space efficient.
• They can be used to plan the movement of multiple (spatially separate) agents, such as a group
of people walking abreast of each other. (Note that a waypoint system would need to plan their
motion in a single-file line.)
• It is easier to incorporate changes to the environment (such as the insertion, removal, or modifi-
cation of obstacles).
• A wide variety of pathfinding algorithms can be modified and optimized for using navigation
meshes.
Of course, because are more complicated than waypoint-based methods, there are also disadvantages
to the use of navigation meshes:
Automatic Generation of Navigation Meshes: If the environment is simple, the navigation mesh can
be added by the artist who generated the level. Of course, we cannot do this for environments that
imported from other sources. If the level is quite large, it is often possible to generate a navigation
mesh fairly easily. (Consider for example the sidewalks and roads of an urban scene.) In less structured
settings, it is often desirable to generate the navigation mesh automatically. How is this done?
There are many possible approaches to building navigation meshes. We will discuss (a simplified
version of) a method due to Mikko Mononen. Let’s begin with a few assumptions. First, we assume
that the moving agents will be walking along a 2-dimensional surface. This surface need not be flat,
and it may contain architectural elements such as ramps, tunnels, and stairways. We will assume that
the input is expressed as a polygonal mesh of the world. We will also assume that our moving agent is
a walking/running humanoid, and hence can be coarsely modeled as a vertical line segment or a thin
cylinder with a vertical axis that translates along this surface.
Find the walkable surfaces: Since we assume that our agent is walking, a polygon is suitable for
walking on if (1) the polygon is roughly parallel to the ground, and (2) there is sufficient headroom
about this polygon for our agent to walk. Such a polygon is said to be walkable. We can identify the
polygons that satisfy the first condition by computing the angle between the polygon’s (outward
pointing) normal vector and the vertical unit vector (see Fig. 47(a)). This angle can be computed
through the use of the dot-product operator, as described in earlier lectures.
(a) (b)
Fig. 47: Walkable surface (side view): (a) Identifying “flat” polygons and (b) voxel method for determining
sufficient headroom.
In order to test the the second property, let h denote the height of the agent. Mononen suggests
the follows very fast and simple approach. First, voxelize the 3-dimensional space using a grid of
sufficient resolution. (For example, the width of the grid should be proportional to the narrowest
gap the agent can slip through.) For each polygon that passes condition (1), we determine how
many voxels lie immediately above this triangle until hitting the next obstacle (see Fig. 47(b)).
(Note that this includes all the obstacle polygons, not just the ones that are nearly level.)
Simplify the Polygon Boundaries: Consider the boundary of the walkable surface. (This the
boundary between walkable polygons and polygons that are not walkable.) This boundary may
generally consist of a very complex polygonal curve with many vertices. We next approximate
this curve by a one have much fewer vertices (see Fig. 48(a)).
v0 v0 v0 v0
vk
vn vn vn vn
(a) (b)
There is a standard algorithm for simplifying polygonal curves, called the Ramer-Douglas-Peucker6
Algorithm. Here is how the algorithm works. First, let δ denote the maximum error that we will
allow in our approximation. Suppose that the curve runs between two points v0 and vn . If the
entire curve fits within a pair of parallel lines at distance δ on either side of the line segment v0 vn ,
6 The algorithm was discovered independently by Urs Ramer in 1972 and by David Douglas and Thomas Peucker in 1973.
Ramer published his result in the computer graphics community and Douglas and Peucker published theirs in the cartography
community.
chord
Before presenting the algorithm, let’s give a couple of definitions. A line segment that connects
two vertices of the polygon and that lies entirely within the interior of the polygon is called a
chord (see Fig. 49(a)). A chord that connects two holes together or that connects a hole with the
outer boundary is called a bridging chord. A chord that connects two vertices that share common
neighboring vertex cuts of a single triangle from the polygon. This triangle is called an ear.
Here is the algorithm:
Bridge the holes: First, connect each hole of the polygon either to another hole or two the
boundary of the outer polygon using bridging chords. Repeatedly select the bridging chord
of minimum length, until all the holes are connected to the outer boundary. (If there are h
holes, this will involve exactly h − 1 bridging chords.)
By thinking of each bridging chord a consisting of two edges, one leading into the hole and
one leading out, the resulting boundary now consists of one connected component, which we
can treat as a polygon without any holes. Number the vertices v0 , . . . , vn in counterclockwise
order around this polygon (see Fig. 49(b)).
Remove Ears: If the polygon consists of only three vertices, then we are done. Otherwise,
find three consecutive vertices vi−1 , vi , vi+1 such that vi−1 vi+1 is a chord. The triangle
4vi−1 , vi , vi+1 is an ear. Among all the possible ears, select the one whose chord is of mini-
mum length. Cut this ear off (ouch!) by adding the chord vi−1 vi+1 . The remaining polygon
has one fewer vertex. Repeat the process recursively on this polygon, until only three vertices
remain. The union of all the removed ears is the final triangulation (see Fig. 49(c)).
By the way, this is but one way to triangulate a polygon with holes. There are many algorithms
that are significantly more efficient than this one (from the perspective of worst-case running
Shorten/Smooth: Compute the shortest path in the resulting graph using any graph-based method
(see Fig. 51(a)). Identify a simple polygon (without holes) by combining all the faces of the mesh
that this path passes through (see the shaded polygon in Fig. 51(b)). Finally, apply any further
optimizations (such as shortening or smoothing) to the path as desired, subject to the constraint
that the path does not leave this shaded polygon (see Fig. 51(c)). Because this polygon has no
holes, it is much easier to perform the desired optimizations.
s s s
Fig. 51: (a) The shortest path between s and t, (b) the polygon containing the path (shaded in red), and
(c) the final smoothed path.
of the mesh are connected together. This also involves issues such as whether there are holes or cavities
within the model.
In the field of topology, a surface patch is called 2-manifold (see Fig. 52(a)). A defining property of
2-manifolds is that in any sufficiently small local neighborhood surrounding any interior point of the
surface looks (up to stretching) like a small circular disk. (See Fig. 52(b) for examples of violations.)
We our manifolds to have boundaries, and they may generally contain holes. Intuitively, you can think
of a 2-manifold (with boundary) to be a very thin rubber sheet, to which someone may have cut out
holes.
cell complex not a cell complex
2-manifold
not a 2-manifold
In order to represent surfaces, it is common to break them up into small polygonal elements, which
are typically triangles. (Triangles are nice, because are always convex and always planar. In general,
a polygonal in 3-dimensional space that is built using four or more vertices might fail either of these
properties.) When two triangles of the mesh are joined together, they are joined edge-to-edge (see
Fig. 52(c)). This implies, in particular, that a vertex of one triangle will not appear in the interior of
an edge or a face of another triangle (as in Fig. 52(d)). Such a decomposition is called a cell complex
or (when triangles only are involved) a simplicial complex. The DCEL data structure is used for
representing cell complexes on 2-manifold surfaces.
The DCEL: A cell complex subdivides a mesh into three types of elements, vertices (0-dimensional), edges
(1-dimensional), and faces (2-dimensional). Thus, we can encode the topological information of a mesh
as an undirected graph. For the purposes of unambiguously distinguishing left from right, it will be
convenient to represent each undirected edge by two oppositely directed edges, called half-edges. An
edge directed from u to v is said to have u as its origin and v as its destination.
For now, let us make the simplifying assumption that the faces of the mesh not have holes inside of
Vertex: Each vertex stores its spatial coordinates, along with a reference to any single incident half-
edge that has this vertex as its origin, v.incident.
Face: Each face f stores a reference to a single half-edge for which this face is the incident face,
f.incident. (Such a half-edge will be directed counterclockwise about the face.)
Edge: Each half-edge (u, v) is naturally associated with two vertices, its origin u and its destination
v, its twin half-edge (v, u), and its two incident faces, one to the left and one to the right. To
distinguish left from right, let us assume that our mesh has an outward facing side and an inward
side. (This works fine for most meshes that arise in solid modeling, since they enclose solid bodies.
There are exceptions, however, such as a Mobius strip.) Consider the half-edge (u, v) and imagine
that you are standing in the middle of the edge on the outer side of the mesh with u behind you
and v in front. The face to your left is the half-edge’s left face, and the other is the right face.
Each half-edge e the DCEL stores the following references (see Fig. 53):
• e.org: e’s origin
• e.twin: e’s oppositely directed twin half-edge
• e.left: the face on e’s left side
• e.next: the next half-edge after e in counterclockwise order about e’s left face
• e.prev: the previous half-edge to e in counterclockwise order about e’s left face (that is, the
next edge in clockwise order).
e.twin
e e.org
e.next
e.prev
e.left
You might observe that there are a number of potentially useful references that we did not store.
This is done to save space, because they can all be computed easily from the above:
• e.dest: e’s destination vertex (e.dest ← e.twin.org)
• e.right: the face on e’s right side (e.right ← e.twin.left)
• e.onext: the next half-edge that shares e’s origin that comes after e in counterclockwise order
(e.onext ← e.prev.twin)
• e.oprev: the previous half-edge that shares e’s origin that comes before e in counterclockwise
order (e.oprev ← e.twin.next)
As another example, suppose that we want to enumerate all the vertices that are neighbors of a given
vertex v in clockwise order about this vertex. We could start at any incident edge e (which by definition
has e as its origin), output its destination vertex, and then visit the next vertex about the origin in
clockwise order.
Enumerate the neighbors of a vertex
ver texNeighborsCW ( Vertex v ) {
Edge start = v . incident ;
Edge e = start ;
do {
output e . dest ; // formally : output e . twin . org
e = e . oprev ; // formally : e = e . twin . next
} while ( e != start ) ;
}
Configuration Spaces: To begin, let us consider the problem of planning the motion of a single agent
among a collection of obstacles. Since the techniques that we will be discussing originated in the
field of robotics, henceforth we will usually refer to a moving agent as a “robot”. The environment in
which the agent operates is called its workspace, which consists of a collection of geometric objects,
called obstacles, which the robot must avoid. We will assume that the workspace is static, that is, the
R(6, 2, 45◦)
30◦
45◦
R(0, 0, 0, 0)
R(0, 0, 0)
Reference pose Reference pose
(a) (b)
Fig. 54: Configurations of: (a) translating and rotating robot and (b) a translating and rotating robot with
a revolute joints.
In 3-dimensional space, a similarly rigid object can be described by six parameters, the (x, y, z)-
coordinates of the object’s reference point, and the three Euler angles (θ, φ, ψ) that define its orientation
in space.)9
A more complex example would be an articulated arm consisting of a set of links, connected to one
another by a set of revolute joints. The configuration of such a robot would consist of a vector of joint
angles (see Fig. 54(b)). The geometric description would probably consist of a geometric representation
of the links. Given a sequence of joint angles, the exact shape of the robot could be derived by combining
this configuration information with its geometric description.
Free Space: Because of limitations on the robot’s physical structure and the obstacles, not every point in
configuration space corresponds to a legal placement of the robot. Some configurations may be illegal
7 The assumption of a static workspace is not really reasonable for most games, since agents move and structures may
change. A common technique for dealing with dynamic environments is to separate the static objects from the dynamic ones,
plan motion with respect to the static objects, and then adjust the plan incrementally to deal with the dynamic ones.
8 The assumption of a known workspace is reasonable in computer games. Note that this is not the case in robotics, where the
world surrounding the robot is either unknown or is known only approximately based on the robots limited sensor measurements.
9 A quaternion might be a more reasonable representation of the robot’s angular orientation in space. You might protest that
the use of a quaternion will involve four parameters rather than three. But remember that the quaterions used for representing
rotations are unit quaternions, meaning that once three of the parameters are given, the fourth one is fixed.
Cfree(R, S)
(a) (b)
Fig. 55: Workspace showing free and forbidden configurations and a possible configuration space.
Such illegal configurations are called a forbidden configurations. Given a robot R and workspace S,
the set of all forbidden configurations is denoted Cforb (R, S), and all other placements are called free
configurations, and the set of these configurations is denoted Cfree (R, S), or free space. These two sets
partition configuration space into two distinct regions (see Fig. 55(b)).
C-Obstacles and Paths in Configuration Space: Motion planning is the following problem: Given a
workspace S, a robot R, and initial and final configurations s, t ∈ Cfree (R, S), determine whether it
is possible to move the robot from one configuration by a path R(s) → R(t)consisting entirely of free
configurations (see Fig. 56(a)).
Based on the definition of configuration space, it is easy to see that the motion planning problem
reduces to the problem of determining whether there is a path from s to t in configuration space (as
opposed to the robot’s workspace) that lies entirely within the robot’s free configuration subspace (see
Fig. 56(b)). Thus, we have reduced the task of planning the motion of a robot in its workspace to the
problem of finding a path for a single point through free configuration space.
Configuration Obstacles and Minkowski Sums: Since high-dimensional configuration spaces are diffi-
cult to visualize, let’s consider the simple case of translating a convex polygonal robot in the plane
amidst a collection of polygonal obstacles. In this cased both the workspace and configuration space
are two-dimensional. We claim that, for each obstacle in the workspace, there is a corresponding con-
figuration obstacle (or C-obstacle) that corresponds to it in the sense that if R(p) does not intersect
the obstacle in the workspace, then p does not intersect the corresponding C-obstacle.
For simplicity, let us assume that the reference point for our robot R is at the origin. Let R(p) denote
the translate of the robot so that its reference point lies at point p. Given a polygonal obstacle P , the
corresponding C-obstacle is formally defined to the set of placements of R that intersect P , that is
One way to visualize C(P ) is to imagine “scraping” R along the boundary of P and seeing the region
traced out by R’s reference point (see Fig. 57(a)).
(a) (b)
Fig. 56: Motion planning: (a) workspace with obstacles and (b) configuration space and C-obstacles.
C(P )
P ⊕Q
P
P p
p+q
q
Q
R
(a) (b)
P ⊕ Q = {p + q : p ∈ P, q ∈ Q}.
Also, define −S = {−p : p ∈ S}. (In in the plane −S is just the 360◦ rotation of S about the origin,
but this does not hold in higher dimensions.) We introduce the shorthand notation R ⊕ p to denote
R ⊕ {p}. Observe that the translate of R by vector p is R(p) = R ⊕ p. The relevance of Minkowski
sums to C-obstacles is given in the following claim.
Claim: Given a translating robot R and an obstacle P , C(P ) = P ⊕ (−R) (see Fig. 58).
Proof: Observe that q ∈ C(P ) iff R(q) intersects P , which is true iff there exist r ∈ R and p ∈ P
such that p = r + q (see Fig. 58(a)), which is true iff there exist −r ∈ −R and p ∈ P such
that q = p + (−r) (see Fig. 58(b)), which is equivalent to saying that q ∈ P ⊕ (−R). Therefore,
q ∈ C(P ) iff q ∈ P ⊕ (−R), which means that C(P ) = P ⊕ (−R), as desired.
C(P ) P ⊕ (−R)
P p P p
q q
r R
−R
R −r
(a) (b)
It is an easy matter to compute −R in linear time (by simply negating all of its vertices) the problem of
computing the C-obstacle C(P ) reduces to the problem of computing a Minkowski sum of two convex
polygons. We’ll show next that this can be done in O(m + n) time, where m is the number of vertices
in R and n is the number of vertices in P .
Note that the above proof made no use of the convexity of R or P . It works for any shapes and in
any dimension. However, computation of the Minkowski sums is most efficient for convex polygons.
We will not present the algorithm formally here, but here is an intuitive explanation. First, compute
the vectors associated with the edges of each polygon and merge them into a single list, sorted by
angular order. Then link them together end-to-end (see Fig. 59). (It is not immediately obvious that
this works, but it can be proved to be correct.)
C-Obstacles for Rotating Robots: When rotation is involved, this scraping process must consider not
only translation, but all rotations that cause the robot’s boundary to touch the obstacle’s boundary.
(One way to visualize this is to fix the value of θ, rotate the robot by this angle, and then compute
the translational C-obstacle with the robot rotated at this angle. Then, stack the resulting C-obstacles
on top of one another, as θ varies through one complete revolution. The resulting “twisted column”
is the C-obstacle in 3-dimensional space.) Note that because the configuration space encodes not only
translation, but the joint angles as well. Thus, a path in configuration space generally characterizes
both the translation and the individual joint rotations. (This is insanely hard to illustrate, so I hope
you can visualize this on your own!)
When dealing with polyhedral robots and polyhedral obstacles models under translation, the C-
obstacles are all polyhedra as well. However, when revolute joints are involved, the boundaries of
the C-obstacles are curved surfaces, which require more effort to process than simply polyhedral mod-
els. Complex configuration spaces are not typically used in games, due to the complexity of processing
them. Game designers often resort to more ad hoc tricks to avoid this complexity, and the expense of
accuracy.
t t t
(a) (b) (c)
to be the sum of all these potentials. Finally, we could include weight factors wt and wo to control the
relative strength of attraction versus repulsion (see Fig. 60(b))
X 1
Ψ(p) = wt · dist(p, t)2 + wo · .
dist(p, o)2
o∈O
The high potential walls around the obstacles keeps the ball from rolling into them. To induce the ball
to roll from s to t, we put s at the peak of a very tall, broad mountain (think Mt. Fuji) and we put
t and the bottom of a very deep, broad bowl. The final potential field Ψ is the sum of these various
functions.
Path Finding via Gradient Descent: Given our potential field, we can apply a physics simulator to let
our robotic marble flow “downhill” from s to t (and hope it eventually arrives!)
How do we compute this path? A natural approach is to compute a path of steepest descent. Given a
point p = (x, y), let Ψ(x, y) denote the value of the potential field at any point direction (x, y), then
the direction of steepest ascent it given by the gradient vector, which can be computed from the partial
derivatives of Ψ. More formally, the gradient is
∂Ψ ∂Ψ
∇Ψ = ,
∂x ∂y
(see Fig. 61). You might wonder why the partial derivative is used here. Observe, for example that if
the function grows very rapidly with x, but is almost flat with respect to y, then the gradient will have
a very high x-component and the y-component will be very close to zero. It takes a bit of calculus to
show that among all directions, the gradient provides the direction of most rapid change. By the way,
one reason for using squared distances in the above potential function, rather than standard Euclidean
distance. It is much easier to compute derivatives of polynomial functions than functions involving
square roots.
Given any starting point p, we can compute the next point along the direction of steepest descent as
p0 ← p − δ · ∇Ψ(p),
for a suitably small step size δ. By repeatedly recomputing the gradient and taking another step, we
will eventually walk to a local minimum of the potential function.
There is some art in how the step size is determined. If the step size is too big, we may shoot past the
minimum, and if the step size is too small, we may require many iterations before converging.
∇(x, y) = ,
∂x ∂y
p
p0 = p − δ∇(p)
Ψ(x, y)
y
x
Advantages: The potential-field method is very easy to implement. Because the movement point
naturally follows a smooth energy-minimizing path, when it converges, it tends to result in smooth,
natural-looking motion. However, it is best used for simple motions, where the desired path
doesn’t involve many twists and turns.
Disadvantages: In addition to the difficulty of determining a good step size, the most significant
disadvantage of the potential-based method for path planning is that it can get trapped in local
minima. If t is not at the bottom of the local minimum then the algorithm simply gets stuck.
Discretizing Configuration Space: Because continuous spaces are difficult to search, it is common to
find paths by a two-step process:
Discretize: Reduce the problem to one of searching a discrete structure (either a graph or a subdivision
of space).
Search: Apply a path-finding algorithm (such as Breadth-First Search, Dijkstra’s algorithm, or A∗ ) to
compute the path in the discrete structures. (We will discuss these algorithms in future lectures.)
In the remainder of this lecture, we will discuss a number of approaches for computing the aforemen-
tioned discrete structure.
Waypoints and Road Maps: Perhaps the simplest approach for generating a navigation graph, is to
scatter a large number of points throughout free space, sometimes called waypoints, and then connect
nearby waypoints to one another if the segment between them does not intersect any obstacles. (Since
this is generally happening in configuration space, the points are in configuration free space and the
segments should not intersect any C-obstacles.)
The edges of this graph can be labeled with the distance10 between the associated points. The resulting
graph is called a road map. Given the location of the start point s and the destination point t, we join
these with edges to nearby waypoints. Finally, we can invoke a shortest path algorithm to compute
the final path. If the graph is connected, then this is guaranteed to yield a valid path in configuration
space, which is then translated back to a motion plan in the robot’s workspace.
Selecting Waypoints: There are a number of methods for computing waypoints. Here are a few:
Placed by the game designer: The game designer has a notion of where it is natural for the game
agents to move, and so places waypoints along routes that he/she deems to be important. For
example, this would include points near the entrances and exits of rooms in an indoor environment
10 In the case of translational motion, distance is an easily defined notion. When the configuration space include rotations,
Fig. 62: A road map for a set of obstacles (a) based on waypoints generated on: (b) a grid and (c) a quadtree.
Adaptive Grid (Quadtree): One way to deal with the grid’s lack of adaptivity is to apply a hier-
archical point placement system that adapts the density of point placement to the distance to
the closest obstacle. A natural generalization of the grid approach is to place waypoints on the
vertices of a quadtree decomposition. Recall that a quadtree decomposes space into square cells
(or generally hypercubes in higher dimensional space). A quadtree cell is said to be stabbed if
the boundary of some obstacle cuts through it. Assuming that we can detect when a quadtree
cell is stabbed, we repeatedly refine any stabbed cells until the cells are deemed to be sufficiently
small. The waypoints are then placed at the vertices of the quadtree that lie in free space (see
Fig. 62(c)).
While this adds adaptivity, it still does not resolve the issue of path segments that are parallel to
the coordinate axes.
Boundary Vertices and the Visibility Graph: In order to deal with the problem of path segments
that are aligned with the coordinate axes, we would like a method of generating waypoints that
is independent of the coordinate system, and relies solely on the geometry of free space. Let us
assume for now that we are working in 2-dimensional space, and the configuration space is bounded
by line segments. If one is interested in shortest paths (assuming the Euclidean distance) it is
easy to prove that such a path will only make turns at the vertices of the obstacle vertices. We
say that two vertices of the boundary of free space are visible if the line segment between them
Fig. 63: A road map for a set of obstacles (a) based on waypoints generated on: (b) the visibility graph of
the obstacle vertices, and (c) the medial axis.
The visibility graph is an intrinsic structure, meaning that it depends only on the object’s geom-
etry, not on the placement of the coordinate system. While it has a number of nice theoretical
properties, it also has a number of drawbacks that make it unsuitable as a general purpose solu-
tion. First, the number of edges in the visibility graph can be as high as O(n2 ), where n is the
number of vertices. If n is very large, this quadratic size may be unacceptable. This problem can
be ameliorated by pruning the graph, say by keeping only the shorter of two edges that share a
common vertex and have a very similar slope.
A second problem of the visibility graph is that the paths it generates, while of minimal length,
have the undesirable property that they point scrapes right only the boundary. This doesn’t
generate very natural looking motion. This issue also can be ameliorated by first constructing
an artificial boundary that is slightly offset from the actual boundary, and then constructing the
visibility graph of the offset boundary.
A third problem with the visibility graph is that it guarantees shortest paths only in 2-dimensional
space. In 3-dimensional space, it is not longer the case that the shortest path between points
bends at vertices. It may bend in the interior of an boundary edge. The locations of these interior
bending points cannot be predicted in advance (since they generally depend on the locations of
the starting and ending points).
Medial-Axis Waypoints: The shortcomings of the visibility graph suggest a very different approach
to path finding. People who walk down a corridor do not usually “scrape” along the boundary.
Rather they usually seek a path near the center of the corridor, that is, they seek a path of
maximum clearance from the obstacles. How can we compute such a path?
We say that a circular disk D lying entirely in free space is maximal if there is no obstacle-free
disk of larger radius that contains D. The union of the centers of all maximal disks naturally
defines a set of points that runs along the center of the free-space domain. It is a fundamental
object in geometry, called the medial axis (see Fig. 63(c)).
By sampling waypoints on or near the medial axis, the robot will naturally move along the centers
of corridors. (Of course, you can add to this a bit of random variation.) This method of placing
waypoints is best for 2-dimensional domains (since it is messier to compute the medial axis of
higher dimensional configuration spaces).
Adaptive Randomized Placement and PRMs: Another adaptive approach to placing waypoints
that avoids dependencies on the coordinate axes is to select the waypoint placement randomly.
Here is the idea. On the ith iteration, we generate a random point p within the domain of interest.
We test to see whether p lies within free space. If not we discard it and go to the next iteration.
not added
p p
new edges
(a) (b) (c)
Fig. 64: Generating a probabilistic road map (PRM) for a set of obstacles. (b) The roadmap after iteration
i − 1 and (c) the result of adding the ith point. Edges that intersect obstacles (red) are not added.
PRMs are very popular in the field of robotics. They can be applied in arbitrary dimensions. It
can be proved that, if there is a path between two configurations then with high probability the
PRM will eventually discover it. Of course if the path travels through a narrow passageway (as
seen in the middle part of the obstacle set of Fig. 64) it may take a lot of samples to discover this
connection.
Because points are randomly generated throughout the domain, it suffers from some of the same
issues as uniform grids. Wide open areas of free space will receive an excessive number of waypoints
while narrow corridors may receive too few. Unlike uniform grids, it is possible to detect that a
newly added waypoint is redundant and so it can be ignored. PRMs suffer the same problem as
all the other waypoint methods we have discussed so far. The paths do not generally travel along
natural paths, but rather they zig-zag from one waypoint to the next.
Rapidly-expanded Random Trees (RRTs): One of the issues with PRMs is that the structure
that is generated is not necessarily connected. Another popular adaptive approach to generating
a roadmap through randomization is through that process that guarantees that the structure that
is generated is connected, and in fact it is a spanning tree over the set of sample points. Spanning
trees are nice for navigation because (due to the fact that they are connected and acyclic) there
is a unique path between any two points. While this may not be great for computing shortest
paths, it is useful for determining the existence of any valid motion.
As with PRMs, the process begins by random sampling points from the domain. In this case, we
will keep every sampled point, even if it does not lie within free space. Let us assume that we
have already computed a spanning tree for the existing set of sample points (consider just the
line segment p0 p1 in Fig. 65(a)), and we are considering the addition of a new sample point p.
We compute the closest point q on the current spanning tree to p. Note that q does not need to
be a sampled point. It is allowed to lie within the interior of an edge of the spanning tree. If
so, we add the point q as a new vertex to the spanning tree. We then consider the line segment
p3 p3 p3
p2 p2 p2 p2
p1 q3 p1 q 3 q 4 p1 q 3 q 4 p1
q2 q2 q2 q2
p04
p0 p0 p0 p4 p0 p4
Fig. 65: Generating a roadmap through the use of rapidly-expanding random trees (RRTs).
Next to PRMs, RRTs are perhaps the most widely used method for computing connectivity
structures in configuration spaces. Notice that both PRMs and RRTs have the advantage that
they can be applied in configuration spaces of arbitrary dimensions.
Particle systems are almost as old as computer games themselves. The very earliest 2-dimensional
games, such as Spacewar! and Asteroids simulated explosions through the use of a particle system.
The power of the technique, along the term “particle system,” came about in one of the special effects
in the second Star Trek movie, where a particle system was used to simulate a fire spreading across
the surface of a planet (the genesis effect).
How Particle Systems Work: One of the appeals of particle systems is that they are extremely simple to
implement, and they are very flexible. Particles are created, live and move according to simple rules,
and then die. The process that generates new particles is called an emitter. For each frame that is
rendered, the following sequence of steps is performed:
Emission: New particles are generated and added to the system. There may be multiple sources of
emission (and particles themselves can serve as emitters).
Attributes: Each new particle is assigned its (initial) individual properties that affect how the particle
moves over time and is rendered. These may change over time and include the following:
Geometric attributes: initial position and velocity
Graphics attributes: shape, size, color, transparency
Dynamic attributes: lifetime, influence due to forces such as gravity and friction
Death: Any particles that have exceeded their prescribed lifetime are removed from the system.
Movement: Particles are moved and transformed according to their dynamic attributes, such as
gravity, wind, and the density of nearby particles.
Rendering: An image of the surviving particles is rendered. Particles are typically rendered as a
small blob.
In order to create a natural look, the process of emitting particles and the assignment of their attributes
is handled in the probabilistic manner, where properties such as the location and velocity of particles
being determined by a random number generator. Many game engines (including Unity) provide
flexible systems for generating particle systems, offering a multitude of options that can be set by the
designer.
Particle system can be programmed to execute any set of instructions at each step, but usually the
dynamic properties of particles are very simple, and do not react in any complex manner to the
presence of other particles in the system. Because the approach is procedural, it can incorporate
any computational model that describes the appearance or dynamics of the object. For example, the
motions and transformations of particles could be tied to the solution of a system of partial differential
equations. In this manner, particles can be used to simulate physical fluid phenomena such as water,
smoke, and clouds.
Closed-form function: Every particle is represented by a parametric function of time (and coeffi-
cients that are given by the particle’s initial motion attributes. For example, given the particle’s
initial position p0 , its initial velocity vector ~v0 , and some fixed field force ~g (representing, for
example, the affect of gravity as a vector that points downwards) the position of the particle at
time t can be expressed in closed form as:
1
p(t) = p0 + ~v0 t + ~g t2 .
2
(If you have ever taken a course in physics, this formula will be familiar to you. If not, don’t
worry how it was derived. It is a simple consequence of Newton’s basic laws of motion.) On the
positive side, this approach requires no storage of the evolving state of the particle, just its initial
state and the elapsed time. On the negative side, this approach is very limited, and does not
allow particles to respond to each other or their environment.
Discrete physical integration: In this type of system, each particle stores in current physical state.
This state consists of the particle’s current position, which is given by a point p, its current
velocity, which is given by a vector ~v , and its current acceleration, which is given by a vector ~a.
Acceleration can be thought of as the accumulated effect of all the forces acting on the particle.11
(For example, the force of gravity decreases the vertical component of the velocity. On the
other hand, the force of air resistance or friction tends to decrease the particle’s absolute velocity,
without altering its direction.) We can then update the state over a small time interval, ∆t. Think
of ∆t as the elapsed time between consecutive frames, or a fixed update time, such as 0.1 seconds.
By the basic laws of kinematics, (1) the total vector sum F~ of forces alters acceleration, (2)
acceleration changes the object’s velocity over ∆t, and (3) velocity changes the object’s position
in space over ∆t:
F~
(1) ~a ← , (2) ~v 0 ← ~v + ~a · ∆t and (3) p0 ← p + ~v · ∆t,
m
where p0 and ~v 0 denote the particle’s new position and velocity, respectively. (Note that, except
for time, these are 3-dimensional vector quantities.)
Doing this in Unity: The Unity physics engine will take care of these operations automatically,
whenever you attach to it a Rigidbody component (and assuming that isKinematic is not enabled).
When isKinematic is disabled (the object is under control of the physics engine) then the body can
be moved by applying forces.
Rigidbody rb = GetComponent < Rigidbody >() ;
rb . AddForce ( Vector3 . up * 10 f ) ; // apply an upward force
The function AddForce has a second optional argument that controls the manner in which the force
is applied. These include
• Force: Add a continuous force to the rigidbody, using its mass
• Acceleration: Add a continuous acceleration to the rigidbody, ignoring its mass
• Impulse: Add an instant force impulse to the rigidbody, using its mass
• VelocityChange: Add an instant velocity change to the rigidbody, ignoring its mass
11 When dealing with particles, it is common to ignore the object’s mass. In general, the mass m of an object is its resistance
~ is given by ~a = F
to change its velocity as a consequence of a force. The acceleration ~a due to a force F ~ /m, which is derived
from the well-known formula F ~ = m~a. Note that acceleration is a vector quantity, where the direction is given by the direction
of the force that is acting on the body.
Flocking Behavior: Next, let us consider the motion of a slightly smarter variety, namely flocking. We
refer to flocking behavior in the generic sense as any motion arising when a group of agents adopt a
decentralized motion strategy designed to hold the group together. Such behavior is exemplified by the
motion of groups animals, such as birds, fish, insects, and other types of herding animals (see Fig. 67).
In contrast to full crowd simulation, where each agent may have its own agenda, in flocking it is assumed
that the agents are homogeneous, that is, they are all applying essentially the same motion update
algorithm. The only thing that distinguishes one agent from the next is their position relative to the
other agents in the system. It is quite remarkable that the complex formations formed by flocking birds
or schooling behavior in fish can arise in a system in which each creature is following (presumably) a
very simple algorithm. The apparently spontaneous generation of complex behavior from the simple
actions of a large collection of dynamic entities is called emergent behavior. While the techniques that
implement flocking behavior do not involve very sophisticated models of intelligence, variants of this
method can be applied to simple forms of crowd motion in games, such as a crowd of people milling
around in a large area or pedestrians strolling up and down a sidewalk.
Boids: One of the earliest models and perhaps the best-known model for flocking behavior was given by C.
Separation: Each boid wishes to avoid collisions with other nearby boids. To achieve this, each boid
generates a repulsive potential field whose radius of influence extends to its immediate neighbor-
hood. Whenever another boid gets too close, the force from this field will tend to push them
apart.
Alignment: Each boid’s direction of flight is aligned with nearby boids. Thus, local clusters of boids
will tend to point in the same direction and hence will tend to fly in the same direction.
Avoidance: Each boid will avoid colliding with fixed obstacles in the scene. At a simplest level, we
might imagine that each fixed obstacle generates a repulsive potential field. As a boid approaches
the object, this repulsive field will tend to cause the boid to deflect its flight path, thus avoiding
a collision. Avoidance can also be applied to predators, which may attack the flock. (It has been
theorized that the darting behavior of fish in a school away from a shark has evolved through
natural selection, since the sudden chaotic motion of many fish can confuse the predator.)
Cohesion: Effects such as avoidance can cause the flock to break up into smaller subflocks. To
simulate the flocks tendency to regroup, there is a force that tends to draw each boid towards
the center of mass of the flock. (In accurate simulations of flocking motion, a boid cannot know
exactly where the center of mass is. In general the center of attraction will be some point that
the boid perceives being the center of the flock.)
Boid Implementation: Next let us consider how to implement such a system. We apply the same discrete
integration approach as we did used for particle systems. In particular, we assume that each boid is
associated with a state vector (p, ~v ) consisting of its current position p and current velocity v. (We
assume that the boid is facing in the same direction as it is flying, but, if not, a vector describing the
boid’s angular orientation can also be added to the state.) We think of the above rules as imposing
forces, which together act to define the boid’s current acceleration. Given this acceleration vector a
caused by the boid forces, we apply the update rules described earlier for particle systems:
~v 0 ← ~v + ~a · ∆t and p0 ← p + ~v · ∆t,
Prioritize and truncate: Assume that there is a fixed maximum magnitude for the acceleration
vector (based on how fast a boid can change its velocity based on what sort of animal is being
modeled). Sort the rules in priority order. (For example, predator/obstacle avoidance is typically
very high, flock cohesion is low.) The initial acceleration vector is the zero vector (meaning
that the boid will simply maintain its current velocity). As each rule is evaluated, compute the
associated acceleration vector and add it to the current acceleration vector. If the length of the
acceleration vector ever exceeds the maximum allowed acceleration, then stop and return the
current vector.
Weight and clamp: Assign weights to the various rule-induced accelerations. (Again, avoidance is
usually high and cohesion is usually low.) Take the weighted sum of these accelerations. If the
length of the resulting acceleration vector exceeds the maximum allowed acceleration, then scale
it down
The first method has the virtue that, subject to the constraint on the maximum acceleration, it
processes the most urgent rules first. The second has the virtue that every rule has some influence on
the final outcome. Of course, since this is just a heuristic approach, the developer typically decides
what sort of approach yields the most realistic results.
(a) (b)
representation, however, since it allows you to model the fact that travel in one direction (say up hill)
may be more expensive than travel in the reversed direction.
In the context of shortest paths, we assume that each edge (u, v) is associated with a numeric weight,
w(u, v). A path in G is any sequence of nodes hu0 , . . . , uk i such that (ui−1 , ui ) is an edge of G. The
Pk
cost of a path is the sum of the weights of these edges i=1 w(ui−1 , ui ). The shortest path problem is,
given a directed graph with weighted edges, and given a start node s and destination node t, compute
a path of minimum cost from s to t (see Fig. 69(b)). Let us denote the shortest path cost from s to t
in G by δ(s, t).
In earlier courses, you have no doubt seen examples of algorithms for computing shortest paths. Here
are some of the better known algorithms.
Breadth-First Search (BFS): This algorithm is among the fastest algorithms for computing short-
est paths, but it works under the restrictive assumption that all the edges have equal weight
(which, without loss of generality, we may assume to be 1). The search starts at s, and then visits
all the nodes that are connected to s by a single edge. It labels all of these nodes as being at
distance 1 from s. It then visits each of these nodes one by one and visits all of their neighbors,
provided that they have not already been visited. It labels each of these as being at distance 2
from s. Once all the nodes at distance 1 have been visited, it then processes all the nodes at
distance 2, and so on. The nodes that are waiting to be visited are placed in a first-in, first-out
queue. If G has n nodes and m edges, then BFS runs in time O(n + m).
Dijkstra’s Algorithm: Because BFS operates under the assumption that the edges weights are all
equal, it cannot be applied to general weighted digraphs. Dijkstra’s algorithm is such an algo-
rithm. It makes the (not unreasonable) assumption that all the edge weights are nonnegative.12
We will discuss Dijkstra’s algorithm below, but intuitively, it operates in a greedy manner by
propagating distance estimates starting from the source node to the other nodes of the graph,
through an incremental process called relaxation. A straightforward implementation of Dijkstra’s
algorithm runs in O(m log n) time (and in theory even faster algorithms exist, but they are fairly
complicated).
Bellman-Ford Algorithm: Since Dijkstra’s algorithm fails if the graph has negative edge weights,
there may be a need for a more general algorithm. The Bellman-Ford algorithm generalizes
Dijkstra’s algorithm by being able to handle graphs with negative edge weights, assuming there
are no negative-cost cycles, that is, there is no cycle such that the sum of edge weights along
the cycle is strictly smaller than zero. It runs in time O(nm). (Note that the assumption that
there are no negative-cost cycles is very reasonable. If such a cycle exists, the path cost could be
made arbitrarily small by looping through this cycle an arbitrary number of times. Therefore, no
shortest path exists.)
12 Negative edge weights do not typically arise in geometric contexts, and so we will not worry about them. They can arise
in other applications. For example, in financial applications, an edge may model a transaction where money can be made or
lost. In such contexts, weights may be positive or negative. When computing shortest paths, however, it is essential that the
graph have no cycles whose total cost is negative, for otherwise the shortest path is undefined.
Other Issues: There are a number of other issues that arise in the context of computing shortest paths.
Storing Paths: How are shortest paths represented efficiently? The simplest way is through the use
of a predecessor pointer. In particular, each node (other than the source) stores a pointer to the
node that lies immediately before it on the shortest path from s. For example, if the sequence
hs, u1 , . . . , uk i is a shortest path, then pred(uk ) = uk−1 , pred(uk−1 ) = uk−2 , and so on (see
Fig. 70(a)). By following the predecessor pointer back to s, we can construct the shortest path,
but in reverse (see Fig. 70(b)). Since this involves only a constant amount of information per
node, this representation is quite efficient.
s s
t
(a) (b)
By the way, in the context of the all-pairs problem (Floyd-Warshall, for example) for each pair
of nodes u and v, we maintain a two-dimensional array P [u, v], which stores either null (meaning
that the shortest path from u to v is just the edge (u, v) itself), or a pointer to any node along
the shortest path from u to v. For example, if P [u, v] = x, then to chart the path from u to v, we
(recursively) compute the path from u to x, and the path from x to v, and then we concatenate
these two paths.
Single Destination: In some contexts, it is desirable to compute an escape route, that is, the shortest
path from every node to some common destination. This can easily be achieved by reversing all
the edges of the graph, and then running a shortest path algorithm. (This has the nice feature
that the predecessor links provide the escape route.)
Closest Facility: Suppose that you have a set of locations, called facilities, {f1 , . . . , fk }. For example,
these might represent safe zones, where an agent can go to in the event of danger. When an alarm
is sounded, every agent needs to move to its closest facility. We can view this as a generalization of
the single destination problem, but now there are multiple destinations, and we want to compute
a path to the closest one.
How would we solve this? Well, you could apply any algorithm for the single-destination problem
repeatedly for each of your facilities. If the number of facilities is large, this can take some
time. A more clever strategy is to reduce the problem to a single instance of an equivalent single
destination problem. In particular, create a new node, called the super destination. Connect all
your facilities to the super destination by edges of cost zero (see Fig. 71(a)). Then apply the
single destination algorithm to this instance. It is easy to see that the resulting predecessor links
will point in the direction of the closest facility (see Fig. 71(b)). Note that this only requires one
invocation of a shortest path algorithm, not k.
f1 f1
f3 f3
(b)
(a)
Of course, this idea can be applied to the case of multiple source points, where the goal is to find
the shortest path from any of these sources.
Informed Search: BFS and Dijkstra have the property that nodes are processed in increasing order of
distance from the source. This implies that if we are interested in computing just the shortest path
from s to t, we can terminate either algorithm as soon as t has been visited. Of course, in the worst
case, t might be the last node to be visited. Often, shortest paths are computed to destinations that
are relatively near the source. In such cases, it is useful to terminate the search as soon as possible.
If we are solving a single-source, single-destination problem, then it is in our interest to visit as few
nodes as possible. Can we do better than BFS and Dijkstra? The answer is yes, and the approach is
to use an algorithm based on informed search.
To understand why we might expect to do better, imagine that you are writing a program to compute
shortest paths on campus. Suppose that a request comes to compute the shortest path from the
Computer Science Building to the Art Building. The shortest path to the Art Building is 700 meters
long. If you were to run an algorithm like Dijkstra, it would visit every node of your campus road
map that lies within distance 700 meters of Computer Science before visiting the Art Building (see
Fig. 72(a)). But, you know that the Art Building lies roughly to the west of Computer Science. Why
waste time visiting a location that is 695 meters to east, since it is very unlikely to help you get to
the Art Building. Dijkstra’s algorithm is said to be an uninformed algorithm, but it makes use of no
external information, such as the fact that the shortest path to a building to the west is more likely to
travel towards the west, than the east. So, how can we exploit this information?
t
t
s s
(a) (b)
Fig. 72: Search algorithms where colors indicate the order in which nodes are visited by the algorithms: (a)
uninformed search (such as Dijkstra) and (b) informed search (such as A∗ ).
Dijkstra’s Algorithm
Dijkstra (G , s , t ) {
foreach ( node u ) { // initialize
d [ u ] = + infinity ; mark u undiscovered
}
d [ s ] = 0; mark s discovered // distance to source is 0
while ( true ) { // go until finding t
let u be the discovered node that minimizes d [ u ]
if ( u == t ) return d [ t ] // arrived at the destination
else {
for ( each unfinished node v adjacent to u ) {
d [ v ] = min ( d [ v ] , d [ u ] + w (u , v ) ) // update d [ v ]
mark v discovered
}
mark u finished // we ’ re done with u
}
}
Best-First Search: What sort of heuristic information could we make use of to better inform the choice
of which vertex u to process next? We want to visit the vertex that we think will most likely lead
us to t quickly. Assuming that we know the spatial coordinates of all the nodes of our graph, one
idea for a heuristic is the Euclidean distance from the node u to the destination t. Given two nodes u
and v, let dist(u, v) denote the Euclidean (straight-line) distance between u and v. Euclidean distance
disregards obstacles, but intuitively, if a node is closer to the destination in Euclidean distance, it is
likely to be closer in graph distance. Define the heuristic function h(u) = dist(u, t). Greedily selecting
the node that minimizes the heuristic function is called best-first search. Do not confuse this with
breadth-first search, even though they share the same three-letter acronym. (See the code block below,
as an example.)
Unfortunately, when obstacles are present it is easy to come up with examples where best-first search
can return an incorrect answer. By using the Euclidean distance, it can be deceived into wandering
into dead-ends, which it must eventually backtrack out of. (Note that once the algorithm visits a
vertex, its d-value is fixed and never changes.)
A∗ Search: Since best-first search does not work, is there some way to use heuristic information to produce
a correct search algorithm? The answer is yes, but the trick is to be more clever in how we use the
heuristic function. Rather than just using the heuristic function h(u) = dist(u, t) alone to select the
next node to process, let us use both d[u] and h(u). In particular, d[u] represents an estimate on the
cost of getting from s to u, and h(u) represents an estimate on the cost of getting from u to t. So, how
about if we take their sum? Define
We will select nodes to be processed based on the value of f (u). This leads to our third algorithm,
called A∗ -search. (See the code block below.)
A-Star Search
A - Star (G , s , t ) {
foreach ( node u ) { // initialize
d [ u ] = + infinity ; mark u undiscovered
}
d [ s ] = 0; mark s discovered // distance to source is 0
repeat forever { // go until finding t
let u be the discovered node that minimizes d [ u ] + dist (u , t )
if ( u == t ) return d [ t ] // arrived at the destination
else {
for ( each unfinished node v adjacent to u ) {
d [ v ] = min ( d [ v ] , d [ u ] + w (u , v ) ) // update d [ v ]
mark v discovered
}
mark u finished // we ’ re done with u
}
}
While this might appear to be little more than a “tweak” of best-first search, this small change is
exactly what we desire. In general, there are two properties that the heuristic function h(u) must
satisfy in order for the above algorithm to work.
Admissibility: The function h(u) never overestimates the graph distance from u to t, that is h(u) ≤
δ(u, t). It is easy to see that this is true, since δ(u, t) must take into account obstacles, and so can
never be smaller than the straight-line distance h(u) = dist(u, t). A heuristic function is said to
be admissible if this is the case.
It turns out that admissibility alone is sufficient to show that A∗ search is correct, but like a graph with
negative edge weights, the search algorithm is not necessarily efficient, because we might declare a node
to be “finished,” but later we will discover a path of lower cost to this vertex, and will have to move
it back to the “discovered” status. (This is similar to what happens in the Bellman-Ford algorithm).
However, if both properties are satisfied, A∗ runs in essentially the same time as Dijkstra’s algorithm
in the worst case, and may actually run faster. The key to the efficiency of the search algorithm is that
along any shortest path from s to t, the f -values are nondecreasing. To see why, consider two nodes
u0 and u00 along the shortest path, where u0 appears before u00 . Then we have
Although we will not prove this formally, this is exactly the condition used in proving the correctness
of Dijkstra’s algorithm, and so it follows as a corollary that A∗ is also correct. It is interesting to note,
by the way, that Dijkstra’s algorithm is just a special case of A∗ , where h(u) = 0. Clearly, this is an
admissible heuristic (just not a very interesting one).
Examples: Let us consider the execution of each of these algorithms on a common example. The input
graph is shown in Fig. 73. For Best-First and A∗ we need to define the heuristic h(u). To save us
from dealing with square roots, we will use a different notion of geometric distance. Define the L1
(or Manhattan) distance between two points to be the sum of the absolute values of the difference
of the x and y coordinates. For example, in the figure the L1 distance between nodes f and t is
dist1 (f, t) = 3 + 6 = 9. For both best-first and A∗ define the heuristic value for each node u to be L1
distance from u to t. For example, h(f ) = 9. (Because the edge weights have been chosen to match
the L1 length of the edge, it is easy to verify that h(·) is an admissible heuristic.)
start 3 dist1(f, t) = 3 + 6 = 9
c 2 s 3
d
3
f
2 2
b 8 e 4
6
2 2
a g 3
h
end
8 t
An intuitive way to think about how Dijkstra’s algorithm operates is to imagine fluid flooding out
from the source node s along the edges simultaneously. The first node that is hit by this fluid is
processed, and begins propagating the fluid along its edges as well. This idea is loosely illustrated
in Fig. 74, where the different colors indicate stages of the flooding algorithm For example, first
red fluid is flooded out of s. The first vertex to be hit is c (at distance 2). The next phase of
flooding is indicated by dark blue. It floods out along c’s edges and continues to flood along s’s
edges. The first vertex to be hit by the blue flood is vertex d, which is processed at a distance of
3 units. While Dijkstra’s algorithm does not explicitly track these flows, it processes nodes in the
order in which the flooding fluid reaches the nodes.
start
c 2 s 3
d
3
f
2 2 0 2 3 4 5 6 7 10 15
b 8 e 4 s c d b e f g h t
2 2 a
a g 3
h
end
8 t
Best-First Search: The table below shows the trace of best-first search. For each discovered node
we show the value d[u] : h(u). At each stage, the discovered node with the smallest h-value
(underlined) is chosen to be the next to be processed. Once processed, a node’s d-value never
changes (as indicated by the down arrow).
Not that Best-first determines that d[t] = 21, which is incorrect (it should be 15).
A search: The table below shows the trace of the A∗ algorithm. For each discovered node we show
∗
the value d[u] : h(u). At each stage, the discovered node with the smallest value of d[u] + h[u]
(underlined) is chosen to be the next to be processed. Once processed, a node’s d value never
changes (as indicated by the down arrow). Note that at Stages 3, 4, and 5 we have a choice of
nodes to process next since there are multiple nodes with the same d[u] + h(u) values.
Observe that this heuristic boosts the h-values so high that they dominate when computing the
f -values. As a result, the algorithm effectively determines which nodes to process based on the
h-values alone, and so the algorithm behaves in essentially the same manner as best-first search,
and computes the same incorrect result.
At an extreme level of abstraction, you might wonder what the big deal is. After all, a decision making
process quite simply is a (possibly randomized) algorithm that maps a set of input conditions into
Is enemy visible?
no yes
In the example the decisions were all binary, but this does not need to be the case. For example, as
with switch statements in Java, it would be possible to have a node with multiple children, where each
child corresponds to one of the possible value types.
Variations on a Theme: While our example showed just simple boolean conditions, you might wonder
whether it is possible to express more complex conditions using decision trees. The short answer is
yes, but it takes a little work. In fact, any decision making algorithm based on a finite sequence of
discrete conditions can be expressed in this manner. For example, suppose you have two boolean tests
A and B, and you want Action 1 to be performed if both A and B are satisfied, and otherwise you
want Action 2 to be performed. This could be encoded using the decision tree shown in Fig. 76(a).
(Note that encoding a boolean or condition is equally easy. Try it.)
Observe that in order to achieve the more complex boolean-and condition, we needed to make two
copies of the “Action 2” node. Replicating leaf nodes is not a major issue, since each such node
would presumably just contain a pointer to a function that implements this action. On the other
hand, if we wanted to share entire subtree structures, replicating these subtrees (especially if it is done
recursively) can quickly add up to a lot of space. Furthermore, copying is an error-prone process, since
any amendment to one subtree would require making the same change in all the others (assuming you
want all of them to implement the same decision-making procedure.) One way to avoid the issue of
copying subtrees is allow subtrees to be shared (see Fig. 77(b)). In spite of the name “decision trees,”
the resulting “decision acyclic directed graphs” are just as easy to search.
Another variation on decision trees is to introduce randomization to the decision-tree. For example, we
might have a decision point that says “flip a coin” (or more generally, generate a random integer over
some finite range with a given probability distribution). Based on the result of this random choice, we
B?
Action 2 C? D?
no yes Action 4
(a) (b)
Fig. 77: (a) Complex boolean conditions and (b) subtree sharing.
could then branch among various actions. This would allow us to add more variation to the behavior
of our agents.
Implementing Decision Trees: Decision trees can be implemented in a number of ways. If the tree is
small (and it is a tree, as opposed to a directed-acyclic graph) you can translate the tree into an
appropriate layered if-then-else statement in your favorite programming/scripting language. More
generally, you can express the tree as a graph-based data structure, where internal nodes hold pointers
to predicate function and leaf nodes hold pointers to action functions.
Finite State Machines: Decision trees are really too simple to be used for most interesting decision-
making processing. The next step up in complexity is to add a notion of state to the character, and
then make decisions a function of both the current conditions and the character’s current state.
For example, a character may behave more aggressively when it is healthy and less aggressively when
it is injured. As another example, a designer may wish to have a character transition between various
states (patrolling, chasing, fighting, retreating) in sequence or when triggered by game events. In each
state, the characters behavior may be quite different.
A finite state machine (FSM) can be modeled as a directed graph, where each node of the graph
corresponds to a state, and each directed edge corresponds to a event, that triggers a change of state
and optionally some associated action. The associated actions may include things like starting an
animation, playing a sound, or modifying the current game state.
As an example, consider the programming of an warrior bot NPC in a first-person shooter. Suppose
that as the designer you decide to implement the following type of behavior:
We can encode this behavior in the form of the FSM showed in Fig. 78(a).
FSMs are a popular method of defining behavior in games. They are easy to implement, easy to design
(if they are not too big), and they are easy to reason about. For example, based on the visual layout of
the FSM, it is easy to see the conditions under which certain state transitions can occur and whether
there are missing transitions (e.g., getting stuck in some state forever).
(a) (b)
Implementing State Machines: How are FSMs implemented? A natural implementation is to use a two-
dimensional array, where the row index is an encoding of the current state and the column index is
an encoding of the possible events that may trigger a transition. Each entry of the array is labeled
with the state to which the transition takes place (see Fig. 78(b)). The array will also contain further
information, such as what actions and animations to trigger as part of the action.
As can be seen from the example shown in the figure, many state-event pairs result in no action or
transition. If this is the case, then the array-based implementation can be space inefficient. A more
efficient alternative would be to use the nonempty state-event pairs as keys into a hash table. Assuming
a good hash-table implementation, the hash table’s size would generally be proportional to the number
nonempty entries.
Note that the FSM we have showed is deterministic, meaning that there is only a single transition
that can be applied at any time. More variation can be introduced by allowing multiple transitions
per event, and then using randomization to select among them (again, possibly with weights so that
some transitions are more likely than others).
Hierarchical State Machines: One of the principal shortcoming with FSMs is that the number of states
can explode as the designer dreams up more complex behavior, thus requiring more states, more events,
and hence the need to consider a potentially quadratic number of mappings from all possible states to
all possible events.
For example, suppose that you wanted to model multiple conditions simultaneously. A character might
be healthy/injured, wandering/chasing/attacking, aggressive/defensive/neutral. If any combination of
these qualities is possible, then we would require 2 · 3 · 3 = 18 distinct states. This would also result
in a number of repeated transitions. (For example, all nine of the states in which the character is
“healthy” would need to provide transitions to the corresponding “injured” states if something bad
happens to us. Requiring this much redundancy can lead to errors, since a designer may update some
of the transitions, but not the others.)
One way to avoid the explosion of states and events is to design the FSM in a hierarchical manner.
First, there are a number of high-level states, corresponding to very broad contexts of the character’s
behavior. Then within each high-level state, we could have many sub-states, which would be used for
modeling more refined behaviors within this state. The resulting system is called a hierarchical finite
state machine (HFSM).
For example, suppose we add an additional property to our warrior bot, namely that he/she gets
hungry from time to time. When this event takes place, the bot runs to his/her favorite restaurant to
eat. When the bot is full, he/she returns to the same state. Since this event can occur from within
any state, we would need to add these transitions from all the existing states (see Fig. 79).
Of course, this would get to be very tedious if we had a very large FSM. The solution, is to encapsulate
most of the guarding behaviors within one FSM, and then treat this as a single super-state in a
hierarchical FSM (see Fig. 80). The hungry/full transitions would cause us to save the current state
(e.g., by pushing it onto a stack), performing the transition. On returning to the state, we would pop
the stack and then resume our behavior as before.
Guarding
Small enemy Hungry
On Guard Fight Get food
Big enemy Full
Losing
Escaped (Start)
Run Away
Note that we create a start state within the super-state. This is to handle the first time that we enter
the state. After this, however, we always return to the same state that we left from.
This could be implemented, for example, by storing the state on a stack, where the highest-level state
descriptor is pushed first, then successively more local states.
The process of looking up state transitions would proceed hierarchically as well. First, we would check
whether the lowest level sub-state has any transition for handling the given event. If not, we could
check its parent state in the stack, and so on, until we find a level of the FSM hierarchy where this
event is to be handled.
The advantage of this hierarchical approach is that it naturally adds modularity to the design process.
Because the number of local sub-states is likely to be fairly small, it simplifies the design of the FSM.
In particular, we can store even a huge number of states because each sub-state level need only focus
on the relatively few events that can cause transitions at this level.
FSM-Based Adventure Games: Very early text-based adventure games were based on finite-state au-
tomata. The earliest example was Colossal Cave Adventure by Will Crowther in the 1970’s. It was
implemented in Fortran and ran on a PDP-10 computer. The game-play involves a series of short de-
scriptions, after which the player could enter simple commands. The objective was to navigate through
the environment to find the treasure. Here is a short example:
You are standing at the end of a road before a small brick building. Around you is
a forest. A small stream flows out of the building and down a gully.
You are inside a building, a well house for a large spring. There are some keys on
the ground here. There is a shiny brass lamp nearby. There is tasty food here.
There is a bottle of water here.
Taken
> go south
You are in a valley in the forest beside a stream tumbling along a rocky bed.
It is not hard to see how this could be implemented using an FSM. The player’s current position is
modeled as the FSM state (with auxiliary information for the player’s inventory), and commands were
mapped to state transitions, and each state is associated with a short description.
Behavior Trees: While FSMs are general, they are not that easy to design. We would like a system that
is more general the FSMs, more structured than programs, and lighter weight than general-purpose
planners. Behavior trees were developed by Geoff Dromey in the mid-2000s in the field of software
engineering, which provides a modular way to define software in terms of actions and preconditions.
They were first used in Halo 2 and were adopted by a number of other games such as Spore.
Let us consider the modeling of a guard dog in an FPS game. The guard dog’s range of behaviors
can be defined hierarchically. At the topmost level, the dog has behaviors for major tasks, such as
patrolling, investigating, and attacking (see Fig. 81(a)). Each of these high-level behaviors could then
be broken down further into lower-level behaviors. For example, the patrol task may include a subtask
for moving. The investigate task might include a subtask for looking around, and the attack task may
include a subtask for bite (ouch!).
guard dog:
The leaves of the tree are where the AI system interacts with the game state. Leaves provide a way to
gather information from the system through conditions, and a way to affect the progress of the game
through actions. In the case of our guard dog, conditions might involve issues such as the dog’s state
(is the dog hungry or injured) or geometric queries (is there another dog nearby, and is there a line of
sight to this dog?). Conditions are read-only. Actions make changes to the world state. This might
involve performing an animation, playing a sound, picking up an object, or biting someone (which
would presumably alter this other object’s state). Conditions can be thought of as filters that indicate
which actions are to be performed.
Sequences: A sequence task performs a series of tasks sequentially, one after the other (see Fig. 82(a)).
As each child in the sequence succeeds, we proceed to the next one. Whenever a child task fails,
we terminate the sequence and bail out (see Fig. 82(b)). If all succeed, the sequence returns
success.
fail!
fail!
evaluate sequentially
until failure
A B C D A B C D
ok ok fail
(a)
(b)
Selector: A selector task performs at most one of a collection of child tasks. A selector starts by
selecting the first of its child tasks and attempts to execute it. If the child succeeds, then the
selector terminates successfully. If the child fails, then it attempts to execute the next child, and
so on, until one succeeds (see Fig. 83(b)). If none succeed, then the selector returns failure.
success
? ? success!
test sequentially
until success
A B C D A B C D
fail fail ok
(a) (b)
An example of a behavior tree is presented in Fig. 84 for an enemy trying to enter a room. If the
door is open, the enemy moves directly into the room (left child of the root). Otherwise, the enemy
approaches the door and tries the knob. If it is unlocked, it opens the door. If locked, it breaks the
door down. After this, it enters the room.
Sequences and selectors provide some of the missing elements of FSMs, but they provide the natural
structural interface offered by hierarchical finite state machines. Sequences and selectors can be com-
bined to achieve sophisticated combinations of behaviors. For example, a behavior might involve a
sequence of tasks, each of which is based on making a selection from a list of possible subtasks. Thus,
they provide building blocks for constructing more complex behaviors.
From a software-engineering perspective, behavior trees give a programmer a more structured context
in which to design behaviors. The behavior-tree structure forces the developer to think about the
Open
Break down
door
Door Open
unlocked? door
Fig. 84: Example of a behavior tree for an enemy agent trying to enter a room.
handling of success and failure, rather than doing so in an ad hoc manner, as would be the case when
expressing behaviors using a scripting language. Note that the nodes of the tree, conditions and tasks,
are simply links to bits of code that execute the desired test or perform the desired action. The behavior
tree provides the structure within which to organize these modules.
terragen speedtree
(a) (b)
Fig. 85: (a) A terrain generated by terragen and (b) a scene with trees generated by speedtree.
Procedural model generation is a useful tool in developing open-world games. For example, the game
No Man’s Sky uses procedural generation to generate a universe of (so it is estimated) 1.8·1019 different
planets, all with distinct ecosystems, including terrains, flora, fauna, and climates (see Fig. 86). The
structure of each planet is not stored on a server. Instead, each is generated deterministically by a
64-bit seed.
Before discussing methods for generating such interesting structures, we need to begin with a back-
ground, which is interesting in its own right. The question is how to construct random noise that has
nice structural properties. In the 1980’s, Ken Perlin came up with a powerful and general method for
doing this (for which he won an Academy Award!). The technique is now widely referred to as Perlin
Noise.
Perlin Noise: Natural phenomena derive their richness from random variations. In computer science,
pseudo-random number generators are used to produce number sequences that appear to be random.
These sequences are designed to behave in a totally random manner, so that it is virtually impossible
to predict the next value based on the sequence of preceding values. Nature, however, does not work
this way. While there are variations, for example, in the elevations of a mountain or the curves in a
river, there is also a great deal of structure present as well.
One of the key elements to the variations we see in natural phenomena is that the magnitude of random
variations depends on the scale (or size) at which we perceive these phenomena. Consider, for example,
the textures shown in Fig. 87. By varying the frequency of the noise we can obtain significantly different
textures.
The tendency to see repeating patterns arising at different scales is called self similarity and it is
fundamental to many phenomena in science and nature. Such structures are studied in mathematics
under the name of fractals. Perlin noise can be viewed as a type of random noise that is self similar at
different scales, and hence it is one way of modeling random fractal objects.
Noise Functions: Let us begin by considering how to take the output of a pseudo-random number generator
and convert it into a smooth (but random looking) function. To start, let us consider a sequence of
random numbers in the interval [0, 1] produced by a random number generator (see Fig. 88(a)). Let
Y = hy0 , . . . , yn i denote the sequence of random values, and let us plot them at the uniformly places
points X = h0, . . . , ni.
Next, let us map these points to a continuous function, we could apply linear interpolation between
pairs of points (also called piecewise linear interpolation. As we have seen earlier this semester, in order
0 0 0
Fig. 88: (a) Random points, (b) connected by linear interpolation, and (c) connected by cosine interpolation.
to interpolate linearly between two values yi and yi+1 , we define a parameter α that varies between 0
and 1, the interpolated value is
To make this work in a piecewise setting we need to set α to the fractional part of the x-value that
lies between i and i + 1. In particular, if we define x mod 1 = x − bxc to be the fractional part of x,
we can define the linear interpolation function to be
1 α 1
(1 − cos(πα))/2
Fig. 89: The blending functions used for (a) linear interpolation and (b) cosine interpolation.
Define g(α) = (1 − cos(πα))/2. The cosine interpolation between two points yi and yi+1 is defined:
Layering Noise: Our noise function is continuous, but there is no self-similarity to its structure. To achieve
this, we will need to combine the noise function in various ways. Our approach will be similar to the
approach used in the harmonic analysis of functions.
Recall that when we have a periodic function, like sin t or cos t, we define (see Fig. 90)
Wavelength: The distance between successive wave crests
Frequency: The number of crests per unit distance, that is, the reciprocal of the wavelength
Amplitude: The height of the crests
wavelength
amplitude
If we want to decrease the wavelength (equivalently increase the frequency) we can scale up the ar-
gument. For example sin t has a wavelength of 2π, sin(2t) has a wavelength of π, and sin(4t) has a
wavelength of π/2. (By increasing the value of the argument we are increasing the function’s frequency,
which decreases the wavelength.) To decrease the function’s amplitude, we apply a scale factor that is
smaller than 1 to the value of the function. Thus, for any positive reals ω and α, the function α · sin(ωt)
has a wavelength of 2π/ω and an amplitude of α.
Now, let’s consider doing this to our noise function. Let f (x) be the noise function as defined in the
previous section. Let us assume that 0 ≤ x ≤ n and that the function repeats so that f (0) = f (n) and
let us assume further that the derivatives match at x = 0 and x = n. We can convert f into a periodic
function for all t ∈ R, which we call noise(t), by defining
(Again we are using the mod function in the context of real numbers. Formally, we define x mod n =
x − n · bx/nc.) For example, the top graph of Fig. 91 shows three wavelengths of noise(t).
In order to achieve self-similarity, we will sum together this noise function, but using different fre-
quencies and with different amplitudes. First, we will consider the noise function with exponentially
increasing frequencies: noise(t), noise(2t), noise(4t), . . . , noise(2i t) (see Fig. 92). Note that we have not
changed the underlying function, we have merely modified its frequency. In the jargon of Perlin noise,
these are called octaves, because like musical octaves, the frequency doubles.13 Because frequencies
double with each octave, you do not need very many octaves, because there is nothing to be gained by
considering wavelengths that are larger than the entire screen nor smaller than a single pixel. Thus,
the logarithm of the window size is a natural upper bound on the number of octaves.
13 In general, it is possible to use factors other than 2. Such a factor is called the lacunarity of the Perlin noise function. For
1 noise(2t)
1 noise(4t)
High frequency noise tends to be of lower amplitude. If we were in a purely self-similar situation, when
the double the frequency, we should halve the amplitude. In order to provide the designer with more
control, Perlin noise allows the designer to specify a separate amplitude for each frequency. A common
way in which to do this is to define a parameter, called persistence, that specifies how rapidly the
amplitudes decrease. Persistence is a number between 0 and 1. The larger the persistence value, the
more noticeable are the higher frequency components. (That is, the more “jagged” the noise appears.)
In particular, given a persistence of p, we define the amplitude at the ith stage to be pi . The final
noise value is the sum, over all the octaves, of the persistence-scaled noise functions. In summary, we
have
Xk
perlin(t) = pi · noise(2i · t),
i=0
1 · noise(2t)
1 2
2
1 · noise(4t)
1 4
4
0
1 perlin(t) = the sum of these
Fig. 92: Dampened noise functions and the Perlin noise function (with persistence p = 1/2).
Noise from Random Gradients: Before explaining the concept of gradients, let’s recall some basics from
differential calculus. Given a continuous function f (x) of a single variable x, we know that the derivative
of the function df /dx yields the tangent slope at the point (x, f (x)) on the function. If we instead
consider a function f (x, y) of two variables, we can visualize the function values (x, y, f (x, y)) as defining
the height of a point on a two-dimensional terrain. If f is smooth, then each point of the terrain can be
associated with tangent plane. The “slope” of the tangent plane passing through such a point is defined
by the partial derivatives of the function, namely ∂f /∂x and ∂f /∂y. The vector (∂f /∂x, ∂f /∂y) is a
vector in the (x, y)-plane that points in the direction of steepest ascent for the function f . This vector
changes from point to point, depending on f . It is called the gradient of f , and is often denoted ∇f .
Perlin’s approach to producing a noisy 2-dimensional terrain involves computing a random 2-dimensional
gradient vector at each vertex of the grid with the eventual aim that the smoothed noise function have
this gradient value. Since these vectors are random, the resulting noisy terrain will appear to behave
very differently from one vertex of the grid to the next. At one vertex the terrain may be sloping up
to the northeast, and at a neighboring vertex it may be slopping to south-southwest. The random
variations in slope result in a very complex terrain. But how do we define a smooth function that
has this behavior? In the one dimensional case we used cosing interpolation. Let’s consider how to
generalize this to a two-dimensional setting.
Consider a single square of the grid, with corners (x0 , y0 ), (x1 , y0 ), (x1 , y1 ), (x0 , y1 ). Let g[0,0] , g[1,0] ,
g[1,1] , and g[0,1] denote the corresponding randomly generated 2-dimensional gradient vectors (see
Fig. 93(c)). Now, for each point (x, y) in the interior of this grid square, we need to blend the effects
of the gradients at the corners. To do this, for each corner we will compute a vector from the corner
to the point (x, y). In particular, define
Fading: The problem with these scalar displacement values is that they are affected by all the corners of
the square, and in fact, as we get farther from the associated corner point the displacement gets larger.
We want the gradient effect to apply close to the vertex, and then have it drop off quickly as we get
closer to another vertex. That is, we want the gradient effect of this vertex to fade as we get farther
from the vertex. To do this, Perlin defines the following fade function. This is a function of t that
will start at 0 when t = 0 (no fading) and will approach 1 when t = 1 (full fading). Perlin originally
settled on a cubic function to do this, ϕ(t) = 3t2 − 2t3 . (Notice that this has the desired properties,
and further its derivative is zero at t = 0 and t = 1, so it will smoothly interpolate with neighboring
squares.) Later, Perlin observed that this function has nonzero second derivatives at 0 and 1, and so
he settled on the following improved fade function:
ψ(t) = 6t5 − 15t4 + 10t3
(see Fig. 94). Observe again that ψ(0) = 0 and ψ(1) = 1, and the first and second derivatives are both
zero at these endpoints.
1
The fade function
0
0 1
Fig. 94: The fade function.
Because we want the effects to fade as a function of both x and y, we define the joint fade function to
be the product of the fade functions along x and y:
Ψ(x, y) = ψ(x)ψ(y).
The final noise value at the point (x, y), arises by taking the weighted average of gradient displacements,
where each displacement is weighted according to the fade function.
We need to apply the joint fade function differently for each vertex. For example, consider the fading
for the displacement δ[1,0] of the lower right corner vertex. We want the influence of this vertex to
increase as x approaches 1, which will be achieved by using a weight of ψ(x). Similarly, we want the
influence of this vertex to increase as y approaches 0, which will be achieved by using a weight of
ψ(1 − y). Therefore, to achieve both of these effects, we will use the joint weight function Ψ(x, 1 − y).
By applying this reasoning to the other corner vertices, we obtain the following 2-dimensional noise
function.
noise(x, y) = Ψ(1 − x, 1 − y)δ[0,0] + Ψ(x, 1 − y)δ[1,0] + Ψ(1 − x, y)δ[0,1] + Ψ(x, y)δ[1,1] .
(As before, recall that the value 2 can be replaced by some parameter `i , where ` > 1.) This applies
i
to each square individually. We need to perform the usual “modding” to generalize this to any square
of the grid. (An example of the final result is shown in Fig. 95(a) and a terrain resulting from applying
this is shown in Fig. 95(b). Note that the terrain does not look as realistic as the terragen from
Fig. 93(a). There are other processes, such as erosion and geological forces that need to be modeled
to achieve highly realistic terrains.)
(a) (b)
Fig. 95: (a) Two-dimensional Perlin noise and (b) a terrain generated by Perlin noise.
Source Code: While the mathematical concepts that we have discussed are quite involved, it is remarkable
that Perlin noise has a very simple implementation. The entire implementation can be obtained from
Perlin’s web page, and is shown nearly in its entirety in the code block below.
If you have taken a course in formal language theory, the concept of an L-system is very similar to
the concept of a context-free grammar. We start with an alphabet, which is a finite set of characters,
called symbols or variables. There is one special symbol, called the start symbol (or axiom in L-system
terminology). In L-systems, symbols are categorized in two types. First, variables are symbols that
can be replaced with other symbols. Second, constants are symbols that are fixed and cannot be
replaced. Finally, there is a finite set of production rules. Each production replaces a single variable
with a string (or zero or more) symbols (which may be variables or constants). Such a rule is expressed
in the following form:
hvariablei → hstringi.
To get a better grasp on this, let us consider a simple example, developed by Lindenmayer himself to
describe the growth of algae.
variables : {A, B}
constants : ∅ (none)
start : A
rules : A → AB; B→A
An L-system works as follows. Starting with the start symbol, we repeatedly replace each variable
with a compatible rule. In this case, each occurrence of A is mapped to AB and each occurrence of
B is mapped to A. This is repeated for some number of levels. (Note that this is major difference
between L-systems and context-free grammars. In context-free grammars, any one rule is applied. In
L-systems, all the applicable rules are applied in parallel. The above grammar produces the following
sequence of strings (for the first 7 levels of application):
n=0 : A
n=1 : AB
n=2 : ABA
n=3 : ABAAB
n=4 : ABAABABA
n=5 : ABAABABAABAAB
n=6 : ABAABABAABAABABAABABA
n=7 : ABAABABAABAABABAABABAABAABABAABAAB
There is nothing particularly pictorial about this, but now let’s assign some drawing instructions. Let
us assume that we associate a push-down stack with the drawing process. Define
• “0”: Draw a unit length line segment with a leaf on the end and increase
• “1”: Draw a unit length line segment
• “ [”: push the current position/scale and angle on the stack turn CCW 45◦ , and scale by 1/2
• “ ]”: pop the current position/scale and angle off the stack, turn CW 45◦ , and scale by 1/2
(An L-system that uses [ and ] is sometimes referred to as an L-system with brackets.)
Now, if we use the above directions as the basis for generating a turtle-geometry drawing, we obtain
the drawing shown in Fig. 97.
Of course, this is far from being a realistic tree, but it is not hard to enhance this basic system with
more parameters and some randomness in order to generate something that looks quite like a tree.
Randomization and Stochastic L-Systems: As described, L-systems generate objects that are too reg-
ular to model natural objects. However, it is an easy matter to add randomization to the process.
The first way of introducing randomness is to randomize the graphical/geometric operations. For ex-
ample, rather than mapping terminal symbols into fixed actions (e.g., draw a unit-length line segment),
we could add some variation (e.g., draw a line segment whose length is a random value between 0.9 and
1.1). Examples include variations in drawing lengths, variations in branching angles, and variations in
thickness and/or texture ( see Fig. 96.)
While the above modifications alter the geometric properties of the generated objects, the underlying
structure is still the same. We can modify L-systems to generate random structures by associating
each production rule with a probability, and apply the rules randomly according to these probabilities.
For example consider the following two rules:
a −→[0.4] a[b]
a −→[0.6] b[a]b
0 0 0 0
0 0 0 1 1 0
0 0 11 11
0 0 1 1 0 0
1 1
1
1
1
1
0 1
1
1
The interpretation is that the first rule is to by applied 40% of the time and the second rule 60% of
the time.
Visibility-based Pursuit Evasion We are given a continuous domain (e.g., a simple polygon) in the
plane. Two moving points, the pursuer p and the evader e, can move in time but must stay within
the domain. Let p(t) and e(t) denote their positions at some time t ≥ 0. The pursuer has caught
the evader if at any time t1 > 0, p(t1 ) can see e(t1 ), meaning that the line segment p(t)e(t) lies
e? e?
p p p
e? e? e?
Fig. 98: Visibility-based pursuit evasion. In the case of polygon (a), the pursuer has a winning strategy, but
in (c) the evader can always evade the pursuer.
For example, consider the domain shown in Fig. 98. For the domain shown in (a), the pursuer
wins the game by following the path shown in (b). No matter where the evader attempts to hide,
he will eventually be seen and further he cannot sneak from one hiding place to another without
being seen by the pursuer. However if we add four small nobs at the end of each of the four bays,
the evader now wins the game (assuming he knows the pursuer’s strategy in advance).
Seeing why the evader can elude detection involves an analysis of the various cases of the sequence
of bays visited by the pursuer. For example, suppose that the pursuer vists the northeast (NE)
bay first, then the southeast (SE), then the northwest (NW), and then comes back to the SE bay,
and suppose that the evader starts in the NW bay. The evader could reason as follows. After
the pursuer leaves NE and moves down to look into SE, the evader zips from NW to NE. When
traverses the central horizontal corridor and starts moving up to the NW bay, the evader is free
to move up into the NE bay. Finally, when the pursuer returns to SE, the evader has left it. After
a bit of thinking, you should be able to convince yourself that by looking ahead to the pursuer’s
next move, the evader can always identify a bay in which to hide now and escape from later.
To see whether you understand this, consider the four domains shown in Fig. 99. For which of
these does the pursuer have a winning strategy, and for which does the evader?
Fig. 99: Visibility-based pursuit evasion. In which of these does the pursuer win? (Hint: The evader wins
in only one of them.)
Given a simple polygon, it is possible to determine whether there exists a pursuit path (that is,
a winning solution for the pursuer). However, the algorithm is quite complex. It runs in time
O(n2 ), where n is the number of vertices in the polygon. The success of the evader depends on its
complete knowledge of the pursuer’s path. If the pursuer is allowed to use randomization (coin
C
to C
Li
Mi
L Li−1
L
M M Mi−1
Crowd Motion: Today, we will discuss motion simulation involving a number of intelligent autonomous
agents, as arises in crowds of pedestrians. Unlike flocking systems, in which it is assumed that the
agents behave homogeneously, in a crowd it is assumed that each agent has its own agenda to pursue
(see Fig. 101). For example, we might imagine a group of students walking through a crowded campus
on their way to their next classes. Since such agents are acting within a social system, however, it is
assumed that they will tend to behave in a manner that is consistent with social conventions. (I don’t
want you to bump into me, and so I will act in a manner to avoid bumping into you as well.)
Crowd simulation is actually a very broad area of study, ranging from work in game programming
and computer graphics, artificial intelligence, social psychology, and urban architecture (e.g., planning
evacuation routes). In order to operate in the context of a computer game, such a system needs to
be decentralized, where each member of the crowd determines its own action based on the perceived
actions of other nearby agents. The problem with applying a simple boid-like flocking behavior is
that, whereas flocking rules such as alignment naturally produce systems that avoid collisions between
agents, the diverse agendas of agents in crowds naturally brings them directly into collisions with other
agents (as in pedestrians traversing a crosswalk from both sides). In order to produce more realistic
motion, the agents should anticipate where nearby agents are moving, and then plan their future
motion accordingly. We discuss two models of crowd behavior, social-force dynamics and (reciprocal)
velocity obstacles.
Social-Force Dynamics: In our presentation of flocking behavior with Boids, we presented a model in
which the motion of each artificial animal is determined by a collection of simple local forces based on
the other agents in the system. In much the same manner that a physicist would simulate the motion
of a collection of particles in computational fluid dynamics, we can simulate the fluid-like motion of
animals in the flock. This can be applied to human crowd behavior as well. At an individual level,
human behavior is quite chaotic, and it is not easy to predict future motion based on past motion.
However, for many common situations the aggregate behavior of a large group of people can be quite
predictable. Examples include people walking along corridors or sidewalks, moving into or evacuating
from a building, milling around in a large area (such as people at a party or on the floor of a convention).
Social-force dynamics attempts to model the motion of large crowds of humans in terms of simple local
forces that incrementally affect their individual motions.
Recalling our earlier lecture on agents in AI, each person in a crowd experiences sensory stimuli, whic
cause a behavioral reaction that depends on the agent’s personal aims and is chosen from a set of
behavioral alternatives. The choice depends on a utility maximization, which varies by individual.
While each individual may have a (highly unpredictable) utility function, the utility functions of a
large group can be modeled by a probability distribution. (E.g., 10% of people are just focused on
getting as quickly as possible to their destination, 20% are in no hurry or are strolling together with
a friend, and 70% are staring at their mobile devices and they are not paying attention to anything
else.)
The physics-based model associates with each agent i and each time instant t a current position, denoted
pi (t), a current velocity vector, denoted ~vi (t). Based on the agent’s desired path (which resulted from
some path planning procedure) each agent has a target velocity vector, denoted ~vi0 (t). As described
below, various forces will be evaluated at each time step. Each force will be represented as a vector,
and the sum of these forces will result in an aggregate force vector, denoted Fi (t). (For example, the
aggregate force will tend to push the agent away from other agents in the crowd and obstacles in the
environment, and will also nudge it back towards the target velocity.)
At each time step ∆, we modify the current velocity vector based on the aggregate force vector:
pi (t + ∆) = pi (t) + ∆ · ~vi (t + ∆)
Of course, after this movement new forces will come into existence, and the process repeats.
Now that we know how to update an agent, the next question is how to compute the forces.
We would like to express the velocities va that satisfy the above criterion as lying within a certain
“forbidden region” of space. To do this, define B(p, r) to be the open Euclidean ball of radius r
centered at point p. That is
B(p, r) = {q : kq − pk < r}.
B pb−p a ra+rb
τ , τ
x
pb − pa
pa
ra vy vy
a
vx vx
t · va ∈ B(pb − pa , ra + rb ),
As t varies from 0 to +∞, the vector t · va travels along a ray that starts at the origin and travels in
the direction of va . Therefore, the set of forbidden velocities are those that lie within a cone that is
centered at the origin and encloses the ball B(pb − pa , ra + rb ).
We define the velocity obstacle of a induced by b, denoted VOa|b to be the set of velocities of a that
will result in a collision with the stationary object b.
This is a subset of the (unbounded) velocity obstacle that eliminates very small velocities (since
collisions farther in the future result when a is moving more slowly). The truncated velocity ob-
stacle is a truncated cone, where the truncation occurs at the boundary of the (1/τ )-scaled ball
B((pb − pa )/τ, (ra + rb )/τ ) (see Fig. 102(c)). Observe that there is an obvious symmetry here. Moving
a with velocity v will result in a collision with (the stationary) b if and only if moving b with velocity
−v will result in an intersection with (the stationary) a. Therefore, we have
VOτa|b = −VOτb|a .
Collision-Avoiding Velocities: Next, let us consider how the velocity obstacle changes if b is moving. If
we assume that b is moving with velocity vb , then a velocity va will generate a collision precisely if the
VOτa|b ⊕ Vb
VOτa|b + vb CAτa|b(Vb)
VOτa|b VOτa|b
vb
vb Vb
vy vy
vx vx
(a) (b)
Fig. 103: Velocity obstacles where (a) object b is moving at velocity vb and (b) object b is moving at any
velocity in the set Vb .
We can further generalize this. We usually do not know another object’s exact velocity, but we can
often put bounds on it. Suppose that rather than knowing b’s exact velocity, we know that b is moving
at some velocity vb that is selected from a region Vb of possible velocities. (For example, Vb might be
square or circular region in space, based on the uncertainty in its motion estimate.)
Let us consider the velocities of a that might result in a collision, assuming that vb is chosen from Vb .
To define this set, we first define the Minkowski sum of two sets of vectors X and Y to be the set
consisting of the pairwise sums of vectors from X and Y , that is,
X ⊕ Y = {x + y : x ∈ X and y ∈ Y }.
Then, clearly a might collide with b if a’s velocity is chosen from VOτa|b ⊕ Vb . Therefore, if we want
to avoid collisions with b altogether, then a should select a velocity from outside this region. More
formally, we define the set of collision-avoiding velocities for a given than b selects a velocity from Vb
is
CAτa|b (Vb ) = {v : v ∈
/ VOτa|b ⊕ Vb }
(see Fig. 103(b).)
Just to recap, if a selects its velocity vector from anywhere outside VOτa|b ⊕ Vb (that is, anywhere inside
CAτa|b (Vb )), then no matter what velocity b selects from Vb , a is guaranteed not to collide with b within
the time interval [0, τ ].
This now provides us with a strategy for selecting the velocities of the agents in our system:
• Compute velocity bounds Vb for all nearby agents
• Compute the intersection of all collision-avoiding velocities for these objects, that is
\
CAτa = CAτa|b (Vb )
b
Any velocity chosen from this set is guaranteed to avoid collisions from now until time τ .
Fig. 104: Oscillation that can result from standard velocity-obstacle motion planning.
Although even humans sometimes engage in this sort of brief oscillation when meeting each other
head-on, this sort of repeated oscillation is very unnatural, and is due to the fact that both agents are
acting without consideration for what the other agent might reasonable do. The question then is how
to fix this?
Reciprocal Velocity Obstacles: The intuition behind fixing the oscillation issue is to share responsibility.
We assume that whenever a collision is possible between two agents, both agents perceive the danger
and (since they are running the same algorithm) they both know how to avoid it. Rather than having
This implies a very harmonious set of candidate velocities, since for any choice va ∈ Va and vb ∈ Vb ,
we can be assured that these two agents will not collide.
Note that there is a complimentary relationship between these two candidate sets. As we increase the
possible velocities in Va , we reduce the possible set of velocities that b can use to avoid a collision, and
vice versa. Of course, we would like be as generous as we can, by giving each agent as much flexibility
as possible. We say that two such candidate velocity sets are reciprocally maximal if
Note that we face a tradeoff here, since we could make Va very large, but at the expense of making Vb
very small, and vice versa. There are infinitely many reciprocally maximal collision avoiding sets. So
what should guide our search for the best combination of candidate sets? Recall that each agent has
its preferred velocity, va∗ and vb∗ . It would seem natural to generate these sets in a manner that gives
each agent the greatest number of options that are close to its preferred velocity. We seek a pair of
candidate velocity sets that are optimal in the sense that they provide each agent the greatest number
of velocities that are close to the agent’s preferred velocity.
There are a number of ways of making this concept more formal. Here is one. Consider two pairs (Va , Vb )
and (Va0 , Vb0 ) of reciprocally maximal collision avoiding sets. For any radius r, B(va∗ , r) denotes the set
of velocities that are within distance r of a’s preferred velocity and B(vb∗ , r) denotes the set of velocities
that are within distance r of b’s preferred velocity. The quantity area(Va ∩B(va∗ , r)) can be thought of as
the “number” (more accurately the measure) of candidate velocities for a that are close (within distance
r) of its preferred velocity. Ideally, we would like both area(Va ∩ B(va∗ , r)) and area(Vb ∩ B(vb∗ , r)) to be
large, so that both agents have access to a large number of preferred directions. One way to guarantee
that two numbers are large is to guarantee that their minimum is large. Also, we would like the pair
(Va , Vb ) to be fair to both agents, in the sense that area(Va ∩ B(va∗ , r)) = area(Vb ∩ B(vb∗ , r)). This
means that they both agents have access to the same “number” of nearby velocities.
Combining the concepts of fairness and maximality, we say that a pair (Va , Vb ) of reciprocally maximal
collision avoiding sets is optimal if, for all radii r > 0, we have
Fair: area(Va ∩ B(va∗ , r)) = area(Vb ∩ B(vb∗ , r))
Maximal: For any other reciprocal collision avoiding set (Va0 , Vb0 ),
min(area(Va ∩ B(va∗ , r)), area(Vb ∩ B(vb∗ , r))) ≥ min(area(Va0 ∩ B(va∗ , r)), area(Vb0 ∩ B(vb∗ , r))).
Now that we have defined this concept, it is only natural to ask whether we have any hope of computing
a pair of sets satisfying such lofty requirements. The remarkable answer is yes, and in fact, it is not
that hard to do! The solution is described in a paper by J. van den Berg, M. C. Lin, D. Manocha
(see the readings at the start of these notes). They define an optimal reciprocal collision avoiding pair
VOτa|b ORCAτb|a
u
2
va∗
va∗
va∗ − vb∗
u − u2
vb∗ vb∗
ORCAτa|b
Fig. 105: Computing an optimal reciprocal collision avoiding pair of candidate velocities.
Intuitively, u reflects the amount of relative velocity diversion needed to just barely escape from the
collision zone. That is, together a’s diversion plus b’s diversion (negated) must sum to u. We could
split the responsibility however we like to. As we had discussed earlier, for the sake reciprocity, we
would prefer that each agent diverts by exactly half of the full amount. That is, a will divert by u/2
and b will divert by −u/2. (To see why this works, suppose that va0 = va∗ + u/2 and vb0 = vb∗ − u/2.
The resulting relative velocity is va0 − vb0 = va∗ − vb∗ + u, which is a collision-free velocity.)
In general, there are a number of choices that a and b could make to avoid a collision. Let n denote
a vector of unit length that points in the same direction as u. We would like a to change its velocity
from va∗ to a velocity whose orthogonal projection onto the vector n is of length at least ku/2k. The
set of allowable diversions defines a half-space (that is, the set of points lying to one side of a line),
and is given by the following formula:
n u o
ORCAτa|b = v : v − va∗ + ·n≥0 ,
2
(where the · denotes the dot product of vectors). This formula defines a halfspace that is orthogonal
to u and lies at distance ku/2k from va∗ (see Fig. 105(b)). Define ORCAτb|a , symmetrically, but using
− u2 instead (see Fig. 105(c)). In their paper, van den Berg, Lin, and Manocha claim that the resulting
pair of sets (ORCAτa|b , ORCAτb|a ) define an optimal reciprocally maximal pair of collision avoiding. In
other words, if a selects any velocity from ORCAτa|b and b selects any velocity from ORCAτb|a , and
these two sets are both fair and provide the greatest number of velocities that are close to both a and
b’s ideal velocities.
This suggests a solution to the problem of planning the motion of n bodies. Let B = {b1 , . . . , bn }
denote the set of bodies T
other than a. Compute the ORCA sets for a relative to all the other agents
n
in the system. That is, i=1 ORCAτa|bi . Since each of these regions is a halfplane, there intersection
n
ORCAτa|bi
\
i=1
va0
va∗
There are two shortcomings with this approach. First, if the agents are very close to one another, it
may be that the intersection of the collision-free regions is empty. In this case, we may need to find an
alternate strategy for computing a’s velocity (or simply accept the possibility of an intersection).
The other shortcoming is that it requires every agent to know the preferred velocity vb∗i for each of the
other objects in the system. While the simulator may know this, it is not reasonable to assume that
every agent knows this information. A reasonable alternative is to form an estimate of the current
velocity, and use that instead. The theory is that most of the time, objects will tend to move in their
preferred direction.
• People are “better” (less predictable/more complex/more interesting) at strategy than AI systems
• Playing with people provides a social element to the game, allowing players to communicate
verbally and engage in other social activities
• Provides larger environments to play in with more characters, resulting in a richer experience
• Some online games support an economy, where players can buy and sell game resources
Transient Games: These games do not maintain a persistent state. Instead players engage in ad
hoc, short lived sessions. Examples include games like Doom, which provided either head-to-head
(one-on-one) or death-match (multiple player) formats. The are characterized as being fast-paced
and providing intense interaction/combat. Because of their light-weight nature, any client can be
a server.
Performance Issues: The most challenging aspects of the design of multiplayer networked games involve
achieving good performance given a shared resource (the network).
Bandwidth: This refers to the amount of data that can be sent through the network in steady-state.
Latency: In games where real-time response is important, a more important issue than bandwidth is
the responsiveness of the network to sudden changes in the state. Latency refers to the time it
takes for a change in state to be transmitted through the network.
Reliability: Network communication occurs over physical media that are subject to errors, either due
to physical problems (interference in wireless signals) or exceeding the network’s capacity (packet
losses due to congestion).
Security: Network communications can be intercepted by unauthorized users (for the purpose of
stealing passwords or credit-card numbers) or modified (for the sake of cheating). Since cheating
can harm the experience of legitimate users, it is important to detect and minimize the negative
effects of cheaters.
Of course, all of these considerations interact and trade-offs must be made. For example, enhancing
security or reliability may require more complex communication protocols, which can have the effect
of reducing the useable bandwidth or increasing latency.
Network Structure: Networks are complex entities to engineer. In order to bring order to this topic,
networks are often described in a series of layers, which is called the Open System Interconnect (OSI)
model. Here are the layers of the model, from bottom (physical) to the top (applications).
Physical: This is the physical medium that carries the data (e.g., copper wire, optical fiber, wireless,
etc.)
Data Link: Deals with low-level transmission of data between machines on the network. Issues at
this level include things like packet structure, basic error control, and machine (MAC) addresses.
Network: This controls end-to-end delivery of individual packets. It is responsible for routing and
balancing network flow. This is the layer where the Internet Protocol (IP) and IP addresses are
defined.
Transport: This layer is responsible for transparent end-to-end transfer of data (not just individual
packets) between two hosts. This layer defines two important protocols, TCP (transmission control
protocol) and UDP (user datagram protocol). This layer defines the notion of a net address, which
consists of an IP address and a port number. Different port numbers can be used to partition
communication between different functions (http, https, smtp, ftp, etc.)
Session: This layer is responsible for establishing, managing, and terminating long-term connections
between local and remote applications (e.g., logging in/out, creating and terminating communi-
cation sockets).
Presentation: Provides for conversion between incompatible data representations based on differ-
ences system or platform, such as character encoding (e.g., ASCII versus Unicode) and byte
ordering (highest-order byte first or lowest-order byte first) and other issues such as encryption
and compression.
Application: This is the layer where end-user applications reside (e.g., email (smtp), data transfer
(ftp, sftp), web browsers (http, https)).
Packet size/format: Are packets of fixed or variable size? How is data to be laid out within each
packet.
Handshaking: This involves the communication exchange to ascertain how data will be transmitted
(format, speed, etc.)
Acknowledgments: When data is received, should its reception be acknowledged and, if so, how?
Error checking/correction: If data packets have not been received or if their contents have been
corrupted, some form of corrective action must be taken.
Compression: Because of limited bandwidth, it may be necessary to reduce the size of the data being
transmitted (either with or without loss of fidelity).
Encryption: Sensitive data may need to be protected from eavesdroppers.
The Problem of Latency: Recall that latency is the time between when the user acts and when the result
is perceived (either by the user or by the other players). Because most computer games involve rapid
and often unpredictable action and response, latency is arguably the most important challenge in the
design of real-time online games. Too much latency makes the game-play harder to understand because
the player cannot associate cause with effect. Latency also makes it harder to target objects, because
they are not where you predict them to be.
Note that latency is a very different issue from bandwidth. For example, your cable provider may be
able to stream a high-definition movie to your television after a 5 second start-up delay. You would not
be bothered if the movie starts after such a delay, but you would be very annoyed if your game were
to impose this sort of delay on you every time you manipulated the knobs on your game controller.
The amount of latency that can be tolerated depends on the type of game. For example, in a Real-
Time Strategy (RTS) game, below 250ms (that is, 1/4 of a second) would be ideal, 250–500ms would
be playable, and over 500ms would be noticeable. In a typical First-Person Shooter (FPS), the latency
should be smaller, say 150ms would be acceptable. In car racing game or other game that involves
fast (twitch) movements, latencies below 100ms would be required. Latencies in excess of 500ms would
make it impossible to control the car. Note that the average latency for the simplest transmission (a
“ping”) on the internet to a geographically nearby server is typically much smaller than these numbers,
say on the order of 10–100ms.
There are a number of sources of latency in online games:
Frame rate latency: Data is sent to/received from the network layer once per frame, and user in-
teraction is only sampled once per frame.
Network protocol latency: It takes time for the operating system to put data onto the physical
network, and time to get it off a physical network and to an application.
Coping with Latency: Since you cannot eliminate latency, you can try to conceal it. Of course, any
approach that you take will introduce errors in some form. The trick is how to create the illusion to
your user that he/she is experiencing no latency.
Sacrifice accuracy: Given that the locations and actions of other players may not be known to you,
you can attempt to render them approximately. One approach is to ignore the time lag and show
a given player information that is known to be out of date. The second is to attempt to estimate
(based on recent behavior) where the other player is at the present time and what this player is
doing. Both approaches suffer from problems, since a player may make decisions based on either
old or erroneous information.
Sacrifice game-play: Deliberately introduce lag into the local player’s experience, so that you have
enough time to deal with the network. For example, a sword thrust does not occur instantaneously,
but after a short wind-up. Although the wind-up may only take a fraction of a second, it provides
the network time to send the information through the network that the sword thrust is coming.
Dealing with Latency through Dead Reckoning: One trick for coping with latency from the client’s
side is to attempt to estimate another player’s current position based on its recent history of motion.
Each player knows that the information that it receives from the server is out of date, and so we (or
actually our game) will attempt extrapolate the player’s current position from its past motion. If our
estimate is good, this can help compensate for the lag caused by latency. Of course, we must worry
about how to patch things up when our predictions turn out to be erroneous.
• Each client maintains precise state for some objects (e.g. local player).
• Each client receives periodic updates of the positions of everyone else, along with their current
velocity information, and possibly the acceleration.
• On each frame, the non-local objects are updated by extrapolating their most recent position using
the available information.
• With a client-server model, each player runs their own version of the game, while the server
maintains absolute authority.
Inevitably, inconsistencies will be detected between the extrapolated position of the other player and its
actual position. Reconciling these inconsistencies is a challenging problem. There are two obvious op-
tions. First, you could just have the player’s avatar jump instantaneously to its most recently reported
position. Of course, this will not appear to be realistic. The alternative is to smoothly interpolate
between the player’s hypothesized (but incorrect) position and its newly extrapolated position.
Dealing with Latency through Lag Compensation: As mentioned above, dead reckoning relies on ex-
trapolation, that is, producing estimates of future state based on past state. An alternative approach,
called lag compensation, is based on interpolation. Lag compensation is a server-side technique, which
attempts to determine a player’s intention.
Here is the idea. Players are subject to latency, which delays in their perception of the world, and
so their decisions are based on information that is slightly out of date with the current world state.
However, since we can estimate the delay that they are experiencing, we can try to roll-back the world
state to a point where we can see exactly what the user saw when they made their decision. We can
The idea is that, if a user was aiming accurately based on the information that he/she was seeing, then
the system can determine this (assuming it has a good estimate of each player’s latency), and credit
the player appropriately.
Note that in the step where we move the player backwards in time, this might actually require forcing
additional state information backwards, too (for example, whether the player was alive or dead or
whether the player was ducking). The end result of lag compensation is that each local client is able
to directly aim at other players without having to worry about leading his or her target in order to
score a hit. Of course, this behavior is a game design tradeoff.
Reliability: Let us move on from latency to another important networking issue, reliability. As we men-
tioned before, in packet-switched networks, data are broken up into packets and then may be sent by
various routes. Packets may arrive out of order, they may be corrupted, or they may fail to arrive at
all (or after such a long delay that the receiver gives up on them). Some network protocols (TCP in
particular) attempt to ensure that every packet is delivered and they arrive in order. (For example, if
you are sending an email message, you would expect the entire message to arrive as sent.)
As we shall see, achieving such a high level of reliability comes with associated costs. For example,
the user sends packets. The receiver acknowledges the receipt of packets to the sender. If a packet
receipt is not acknowledged, the sender resends the packet. The additional communication required for
sending, receiving, and processing acknowledgments can increase latency and use more of the available
bandwidth.
In many online games, however, we may be less concerned that every packet arrives on time or in order.
Consider for example a series of packets, each of which tells us where an enemy player is located. If
one of these packets does not arrive (or arrives late) the information is now out of date anyway, and
there is no point in having the sender resend the packet. Of course, some information is of a much
more important nature. Information about payments or certain changes to the discrete state of the
game (player X is no longer in the game), must be communicated reliably. In short, not all information
in a game is of equal importance with respect to reliability.
Communication reliability is handled by protocols at the transport level of the OSI model. The two
most common protocols are TCP (transmission control protocol) and UDP (user datagram protocol).
Advantages:
• Guaranteed packet delivery
• Ordered packet delivery
• Packet check-sum checking (basic error detection)
• Transmission flow control
Disadvantages:
• Point-to-point transport (as opposed to more general forms, like multi-cast)
• Bandwidth and latency overhead
• Packets may be delayed to preserve order
TCP is used in applications where data must be reliably sent and/or maintained in order. Since it is
a reliable protocol, it can be used in games where latency is not a major concern.
User Datagram Protocol: UDP is a very light-weight protocol, lacking the error control and flow control
features of TCP. It is a connectionless protocol, which provides no guarantees of delivery. The sender
merely sends packets, with no expectation of any acknowledgment. As a result, the overhead is much
smaller than for TCP.
Advantages:
• Packet based—so it works with the internet
• Lower overhead than TCP in terms of both bandwidth and latency
• Immediate delivery—as soon as it arrives it goes to the client
Disadvantages:
• Point to point connectivity (as with TCP)
• No reliability guarantees
• No ordering guarantees
• Packets can be corrupted
• Can cause problems with some firewalls
UDP is popular in games, since much state information is nonessential and quickly goes out of date.
Note that although the UDP protocol has no built-in mechanisms for error checking or packet acknowl-
edgments, the application can add these to the protocol. For example, if some packets are non-critical,
they can be sent by the standard UDP protocol. Certain critical packets can be flagged by your appli-
cation, and as part of the packet payload, it can insert its own sequence numbers and/or check-sums.
Thus, although UDP does not automatically support TCPs features, there is nothing preventing your
application from adding these to a small subset of important packets.
Pritchard identifies a number of common cheating attacks and discusses how to counter them. His list
includes the following:
Information Exposure: This method of cheating involves the cheater gaining access to information that
they are not entitled to, such as their opponent’s health, weapons, resources, troops. This cheat is
possible as developers often incorrectly assume that the client software can be trusted not to reveal
secrets. Secret information is revealed by either modifying the client or running another program that
extracts it from memory.
Another approach for doing this is to modify elements of the graphics model. For example, suppose that
you have a shooter game, where enemies may hide behind walls, bushes, or may rely on atmospheric
effects like smoke or fog. The cheater then modifies the parameters that control these obscuring
elements, say by making walls transparent, removing the foliage on bushes, and changing the smoke
parameters so it effectively disappears. The cheating application alone now has an un-obscured view
of the battlefield.
This is sometimes called an infrastructure-level cheat, since it usually involves accessing or modifying
elements of the infrastructure in which the program runs. In a client-server setting, this can be dealt
with is using a technique called on-demand-loading (ODL). Using this technique a trusted third party
(the server) stores all secret information and only transmits it to the client when they are entitled to it.
Therefore, the client does not have any secret information that may be exposed. Another approach for
avoiding information exposure is to encrypt all secret information. This makes it difficult to determine
where the information is and how to interpret its meaning.
Protocol-level cheats: Because most multiplayer games involve communication through a network, many
cheats are based on interfering with the manner in which network packets are processed. Packets may
be inserted, destroyed, duplicated, or modified by an attacker. Many of these cheats are dependent on
the architecture used by the game (client-server or peer-to-peer). Below we describe some protocol-level
cheats.
Fixed delay: Fixed delay cheating involves introducing a fixed amount of delay to all outgoing packets.
This results in the local player receiving updates quickly, while delaying information to opponents.
For fast paced games this additional delay can have a dramatic impact on the outcome. This cheat is
usually used in peer-to-peer games, when one peer is elevated to act as the server. Thus, they can add
delay to all other peers.
One way to prevent this cheat in peer-to-peer games can use distributed event ordering and consistency
protocols to avoid elevating one peer above the rest. Note, the fixed delay cheat only delays updates,
in contrast to dropping them in the suppressed update cheat.
Another solution is to force all players to use a protocol that divides game time into rounds and requires
that every player in the game submit their move for that round before the next round is allowed to
begin. (One such protocol is called lockstep.) To prevent cheating, all players commit to a move,
and once all players have committed, each player reveals their move. A player commits to a move by
transmitting either the hash of a move or an encrypted copy of a move, and it is revealed by sending
either the move or encryption key respectively. Lockstep is provably secure against these and other
protocol level cheats. Unfortunately, this approach is unacceptably slow for many fast-paced games,
since it forces all players to wait on the slowest one.
Another example of a protocol to prevent packet suppression/delaying is called sliding pipeline (SP).
SP works by constantly monitoring the delay between players to determine the maximum allowable
delay for an update without allowing times-stamp cheating (see below). SP does not lock all players
into a fixed time step, and so can be applied to faster-paced games. Unfortunately, SP cannot always
differentiate between players suffering delay and cheaters (false positives).
More Protocol-Level Cheats: The above (suppressed update and fixed delay) are just two examples of
protocol-level cheats. There are many others, which we will just summarize briefly here.
Inconsistency: A cheater induces inconsistency amongst players by sending different game updates to
different opponents. An honest player attacked by this cheat may have his game state corrupted,
and hence be removed from the game, by a cheater sending a different update to him than was
sent to all other players. To prevent this cheat updates sent between players must be verified by
either a trusted authority, or a group of peers.
Time-stamp: This cheat is enabled in games where an untrusted client is allowed to time-stamp
their updates for event ordering. This allows cheaters to time-stamp their updates in the past,