10.1201 b11324 Previewpdf PDF

game physics pearls
van den Bergen

Gregorius
game physics pearls
The common theme in this practical, hands-on collection is experience.
The contributors write based on their knowledge and skill in developing
tools and runtime libraries either in game companies or middleware
game
physics
houses that produce physics software for games on PCs and consoles.
Each article describes not only a specific topic, but provides an in-the-
trenches discussion of the practical problems and solutions when
implementing the algorithms, whether for a physics engine or game
application.
pearls
The chapters cover topics such as collision detection, particle-based
simulations, constraint solving, and soft-body simulation. Several of
the topics are about nonsequential programming, whether multicore
or for game consoles, which is important given the evolution of modern
computing hardware toward multiprocessing and multithreading.
Edited by
Image © Wayne Johnson, 2010.
Gino van den Bergen and Dirk Gregorius
A K Peters, Ltd. Used under license from Shutterstock.com A K PETERS
i i
i i
Game Physics Pearls
i i
i i
i i
i i
i i
i i
i i
i i
Game Physics Pearls
Edited by
Gino van den Bergen
and
Dirk Gregorius
A K Peters, Ltd.
Natick, Massachusetts
i i
i i
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2010 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works

Version Date: 20140703
International Standard Book Number-13: 978-1-4398-6555-2 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reason-
able efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The
authors and publishers have attempted to trace the copyright holders of all material reproduced in
this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.
copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organiza-
tion that provides licenses and registration for a variety of users. For organizations that have been
granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
i i
i i
Contents
Foreword xi
Preface xiii
I Game Physics 101 1
1 Mathematical Background 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Vectors and Points . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Lines and Planes . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Matrices and Transformations . . . . . . . . . . . . . . . . . 9
1.5 Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Rigid-Body Dynamics . . . . . . . . . . . . . . . . . . . . . 15
1.7 Numerical Integration . . . . . . . . . . . . . . . . . . . . . . 22
1.8 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . 26
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Understanding Game Physics Artifacts 29

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Discretization and Linearization . . . . . . . . . . . . . . . . 29
2.3 Time Stepping and the Well of Despair . . . . . . . . . . . . . 31
2.4 The Curse of Rotations . . . . . . . . . . . . . . . . . . . . . 32
2.5 Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Collision Detection . . . . . . . . . . . . . . . . . . . . . . . 36
2.7 Joints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.8 Direct Animation . . . . . . . . . . . . . . . . . . . . . . . . 42
2.9 Artifact Reference . . . . . . . . . . . . . . . . . . . . . . . . 43
II Collision Detection 45
3 Broad Phase and Constraint Optimization for PlayStation R

3 47
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Overview of Cell/BE . . . . . . . . . . . . . . . . . . . . . . 47
i i
i i
i i
i i
vi Contents
3.3 Optimization of the Broad Phase . . . . . . . . . . . . . . . . 51

3.4 Optimization of the Constraint Solver . . . . . . . . . . . . . 57
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4 SAT in Narrow Phase and Contact-Manifold Generation 63

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Contact Manifold . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Physics Engine Pipeline . . . . . . . . . . . . . . . . . . . . . 65
4.4 SAT Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5 Intuitive Gauss Map . . . . . . . . . . . . . . . . . . . . . . . 75
4.6 Computing Full Contact Manifolds . . . . . . . . . . . . . . . 77
4.7 SAT Optimizations . . . . . . . . . . . . . . . . . . . . . . . 89
4.8 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 96
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5 Smooth Mesh Contacts with GJK 99

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2 Configuration Space . . . . . . . . . . . . . . . . . . . . . . . 101
5.3 Support Mappings . . . . . . . . . . . . . . . . . . . . . . . . 103
5.4 Overview of GJK . . . . . . . . . . . . . . . . . . . . . . . . 105
5.5 Johnson’s Algorithm . . . . . . . . . . . . . . . . . . . . . . 106
5.6 Continuous Collision Detection . . . . . . . . . . . . . . . . . 110
5.7 Contacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
III Particles 125
6 Optimized SPH 127

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2 The SPH Equations . . . . . . . . . . . . . . . . . . . . . . . 128
6.3 An Algorithm for SPH Simulation . . . . . . . . . . . . . . . 131
6.4 The Choice of Data Structure . . . . . . . . . . . . . . . . . . 132
6.5 Collapsing the SPH Algorithm . . . . . . . . . . . . . . . . . 139
6.6 Stability and Behavior . . . . . . . . . . . . . . . . . . . . . 143
6.7 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.9 Appendix: Scaling the Pressure Force . . . . . . . . . . . . . 150
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
i i
i i
i i
i i
Contents vii
7 Parallelizing Particle-Based Simulation on Multiple Processors 155

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.2 Dividing Computation . . . . . . . . . . . . . . . . . . . . . 156
7.3 Data Management without Duplication . . . . . . . . . . . . . 159
7.4 Choosing an Acceleration Structure . . . . . . . . . . . . . . 162
7.5 Data Transfer Using Grids . . . . . . . . . . . . . . . . . . . 173
7.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
IV Constraint Solving 177
8 Ropes as Constraints 179

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.2 Free-Hanging Ropes . . . . . . . . . . . . . . . . . . . . . . 181
8.3 Strained Ropes . . . . . . . . . . . . . . . . . . . . . . . . . 184
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
9 Quaternion-Based Constraints 195

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
9.2 Notation and Definitions . . . . . . . . . . . . . . . . . . . . 195
9.3 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
9.4 Constraint Definitions . . . . . . . . . . . . . . . . . . . . . . 198
9.5 Matrix-Based Quaternion Algebra . . . . . . . . . . . . . . . 201
9.6 A New Take on Quaternion-Based Constraints . . . . . . . . . 203
9.7 Why It Works . . . . . . . . . . . . . . . . . . . . . . . . . . 203
9.8 More General Frames . . . . . . . . . . . . . . . . . . . . . . 204
9.9 Limits and Drivers . . . . . . . . . . . . . . . . . . . . . . . 206
9.10 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
9.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
9.12 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 213
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
V Soft Body 215
10 Soft Bodies Using Finite Elements 217

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
10.2 Continuum Mechanics . . . . . . . . . . . . . . . . . . . . . 218
10.3 Linear FEM . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
i i
i i
i i
i i
viii Contents
10.4 Solving the Linear System . . . . . . . . . . . . . . . . . . . 241

10.5 Surface-Mesh Update . . . . . . . . . . . . . . . . . . . . . . 246
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
11 Particle-Based Simulation Using Verlet Integration 251

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
11.2 Techniques for Numerical Integration . . . . . . . . . . . . . 252
11.3 Using Relaxation to Solve Systems of Equations . . . . . . . . 256
11.4 Rigid Bodies . . . . . . . . . . . . . . . . . . . . . . . . . . 261
11.5 Articulated Bodies . . . . . . . . . . . . . . . . . . . . . . . 264
11.6 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . 266
11.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
12 Keep Yer Shirt On 271

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
12.2 Stable Real-Time Cloth . . . . . . . . . . . . . . . . . . . . . 271
12.3 Modeling Real Fabrics . . . . . . . . . . . . . . . . . . . . . 273
12.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
12.5 Order of Cloth Update Stages . . . . . . . . . . . . . . . . . . 278
12.6 Conclusion, Results, and Future . . . . . . . . . . . . . . . . 279
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
VI Skinning 281
13 Layered Skin Simulation 283

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
13.2 Layered Deformation Architecture . . . . . . . . . . . . . . . 283
13.3 Smooth Skinning . . . . . . . . . . . . . . . . . . . . . . . . 287
13.4 Anatomical Collisions . . . . . . . . . . . . . . . . . . . . . 291
13.5 Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
13.6 Jiggle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
13.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
14 Dynamic Secondary Skin Deformations 305

14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
14.2 The Interaction Model . . . . . . . . . . . . . . . . . . . . . 307
14.3 Neighborhood Interaction . . . . . . . . . . . . . . . . . . . . 311
14.4 Volumetric Effects . . . . . . . . . . . . . . . . . . . . . . . 318
i i
i i
i i
i i
Contents ix
14.5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 327

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Index 341
i i
i i
i i
i i
i i
i i
i i
i i
Foreword
I am not a fan of gems-style books. Typically, they are assembled and glued
together as a collection of loosely related articles, and no attempt is made to unify
them by emphasizing common themes and ideas. When I was asked to write the
foreword for this book, my initial reaction was to decline politely, thinking this
was yet another such book. However, knowing the editors and their reputations
in the physics engine industry, I agreed to read the book in hopes that there might
be a few articles that make the book a worthwhile purchase.
I am delighted to say that this book is much more than I imagined. Those few
articles I hoped to find interesting turned out to be all the articles! I congratulate
the editors and the authors for producing the finest collection of game physics
articles I have seen to date. The common theme is experience. Each author de-
scribes not only a topic of interest, but provides an in-the-trenches discussion of
the practical problems and solutions when implementing the algorithms, whether
for a physics engine or game application. Moreover, I found it comforting that
the authors were consistent in their findings, giving me hope that writing a fast
and robust physics engine actually can be a scientific process rather than an en-
deavor that combines art, hacks, and voodoo. Also of importance is that several of
the topics are about nonsequential programming, whether multicore or for game
consoles, which is important given the evolution of modern computing hardware
towards multiprocessing and multithreading.
This book is a must-have if you plan on exploring the world of physics pro-
gramming. And I hope the editors and authors have plans on producing more
books of the same great quality.
—Dave Eberly
xi
i i
i i
i i
i i
i i
i i
i i
i i
Preface
It took some time before I considered myself a physics programmer. Like most
game programmers, I started out toying with physics in hobby game projects.
These early attempts at getting physical behavior out of an 8-bit home computer
did involve concepts such as velocity, mass, and force, but in my head they were
far from “real” physics. In the following years at the university I learned how to
program properly and got proficient in linear algebra, geometric algorithms, and
computer graphics. I took courses in theoretical mechanics and numerical anal-
ysis, expecting that after overcoming these hurdles, developing a physics engine
would be easy.
It never did get easy. In the coming years I was struggling to get even the
simplest of rigid body simulations stable on computers that were a thousand times
more powerful than the 8-bit home computer from my junior years. It would take
a considerable number of hacks to stop my “resting” contacts from oscillating and
from bouncing all over the place. And even then, the hacks would work only for
objects within certain ranges of masses and sizes. In the end, most of the hacks
that seemed to work would make Sir Isaac Newton turn in his grave. My inner
physicist was nagging at me, telling me that what I was doing was not “real”
physics. I was failing to truly capture classical mechanics as it was taught to me
in the code of a real-time physics engine. Surely, anyone who needs to resort to
the use of cheap hacks to get things working could never be considered a genuine
physics programmer.
After spending a couple of years in the game industry I learned that an under-
standing of classical mechanics and the ability to apply it in code are not the prime
skills of a physics programmer. Of course, any physics programmer should feel
comfortable with the science and mathematics behind physics, but being too con-
cerned about the science can become a burden as well. Games that involve physics
should primarily be targeted at playability and robustness rather than showcase a
maximum realism level. I had to overcome some hesitation before I willingly
started breaking the laws of physics and came up with hacks that created “un-
natural” behavior that fixed some game design issues. For example, in an arcade
racing game, cars should drift nicely, should rarely tip over, and if they do, should
always land back on their wheels—but most of all, they should never get stuck in
parts of the scenery. A game physics programmer can start with a realistic driving
behavior and then add helper forces and impulses to govern down force, balance,
turn rate, and what not, in order to get just the right behavior. It takes creativity
and a lot of experience to make a game that relies heavily on physics and is fun.
xiii
i i
i i
i i
i i
xiv Preface
This book is written by and targeted at game physics programmers. We seek

to provide experience and proven techniques from experts in the field and focus on
what is actually used in games rather than on how to achieve maximum realism.
You will find a lot of hacks here, but they should not be regarded as “cheap.” They
are the result of many years of hard work balancing playability, robustness, and
visual appeal. Such information was previously found only on internet forums
and at game developers conferences. This is the first gems-type book that collects
articles on tricks of the trade in game physics written by people in the trade, and
as such, seeks to fill a gap in game technology literature.
It was not easy to set this book in motion. There were two main forces working
against us during production. Firstly, in the game industry developers usually do
not have nine-to-five jobs. Dedicating the little spare time that one has to a book
article is not a light decision for many people. Secondly, physics programmers
tend to be quite modest about their work and need some encouragement to make
them share their ideas. Perhaps many of us are plagued by the same inner physicist
who nags about our disregard for the laws of physics. Nevertheless, once the
project gained momentum, great stuff came out of the gang of contributors we
managed to lure in.
I very much enjoyed editing for this book; it’s great to see a coherent book
taking form when each of the authors is adding a piece to the puzzle. I would like
to thank everyone who contributed to this book. My gratitude goes to the authors,
the staff at A K Peters, all external reviewers, copy editors, the cover designer,
and last but not least to Dirk, my fellow co-editor and physics buddy.
—Gino van den Bergen

June 16, 2010
My initial contact with game physic programming was totally accidental. I had
just finished my studies of civil engineering and I was sitting in a cafe talking to
an old girlfriend I hadn’t seen for a while. As she asked me what I would do next
I replied that I would be maybe interested in game development. As it turned out
her husband (who just returned from GDC) was a veteran in the game industry,
and he invited me for an interview. In this interview I learned that his company
was working on a release title for the PS3 and was currently looking for a physics
programmer. I had no idea what this meant, but I happily accepted.
When I started my work, I was overwhelmed by the huge amount of books,
papers, and especially rumors that were around. People on public forums had
i i
i i
i i
i i
Preface xv
many ideas and were gladly sharing them, but sadly these ideas often worked
reliably only in very specific situations. I quickly learned that it was very hard
to get accurate information that was actually useable in a game. At this point I
wished for a collection of proven algorithms that actually were used in a shipped
title, but sadly no such source existed at that time.
As Gino mentioned his idea of such a book, I was immediately excited and felt
flattered to support him as editor. It is my very strong belief that game physics pro-
gramming is about choosing the right algorithms rather then inventing everything
yourself. Having a collection of proven techniques is a great help in architecturing
a solution for the specific needs of any game.
It was a great experience editing this book, and I enjoyed every minute work-
ing with every author. They all showed a great enthusiasm for contributing to this
book. I would like to thank all the authors, the staff at A K Peters, all the external
reviewers, the copy editors, the cover designer, and especially Gino for getting
me on board of this project.
—Dirk Gregorius
June 18, 2010
i i
i i
i i
i i
i i
i i
i i
i i
-I-
Game Physics 101
i i
i i
i i
i i
i i
i i
i i
i i
-1-
Mathematical Background
James M. Van Verth
1.1 Introduction
It has been said that, at its core, all physics is mathematics. While that statement
may be debatable, it is certainly true that a background in mathematics is indis-
pensable in studying physics, and game physics is no exception. As such, a single
chapter cannot possibly cover all that is useful in such a broad and interesting
field. However, the following should provide an essential review of the mathe-
matics needed for the remainder of this book. Further references are provided at
the end of the chapter for those who wish to study further.
1.2 Vectors and Points

1.2.1 Definitions and Relations
The core elements of any three-dimensional system are points and vectors. Points
represent positions in space and are represented graphically as dots. Vectors rep-
resent direction or rate of change—the amount of change indicated by the length,
or magnitude, of the vector—and are presented graphically as arrows. Figure 1.1
Figure 1.1. Relationship between points and vectors.
i i
i i
i i
i i
4 1. Mathematical Background
Figure 1.2. Vector scaling and addition.
shows the relationship between points and vectors—in this case, the vector is
acting as the difference between two points. Algebraically, this is
v = x1 − x0
or
x1 = x0 + v.
In general, vectors can be scaled and added. Scaling (multiplying by a single

factor, or scalar) changes the length of a vector. If the scalar is negative, it can
also change the direction of the vector. Adding two vectors together creates a new
vector that points from the tail of one to the head of another (see Figure 1.2).
Scaling and adding together an arbitrary number of vectors is called a linear
combination:

v= ai vi .
i
A set of vectors v is linearly dependent if one of the vectors in S can be repre-

sented as the linear combination of other members in S. Otherwise, it is a linearly
independent set.
Points cannot be generally scaled or added. They can only be subtracted to
create a vector or combined in a linear combination, where

ai = 1.
i
This is known as an affine combination. We can express an affine combination as

follows:

n−1
n−1
x = 1− ai xn + ai xi
i i
i i
i i
i i
i i
1.2. Vectors and Points 5

n−1
n−1
= xn − ai xn + ai xi
i i

n−1
= xn + ai (xi − xn )
i

n−1
= xn + ai vi .
i
So an affine combination can be thought of as a point plus a linear combination

of vectors.
We represent points and vectors relative to a given coordinate frame. In three
dimensions, or R3 , this consists of three linearly independent vectors e1 , e2 , and
e3 (known as a basis) and a point o (known as an origin). Any vector in this space
can be constructed using a linear combination of the basis vectors:
v = xe1 + ye2 + ze3 .
In practice, we represent a vector in the computer by using the scale factors

(x, y, z) in an ordered list.
Similarly, we can represent a point as an affine combination of the basis vec-
tors and the origin:
x = o + xe1 + ye2 + ze3 .
Another way to think of this is that we construct a vector and add it to the origin.
This provides a one-to-one mapping between points and vectors.
1.2.2 Magnitude and Distance

As mentioned, one of the quantities of a vector v is its magnitude, represented by
v. In R3 , this is

v = x2 + y 2 + z 2 .
We can use this to calculate the distance between two points p1 and p2 by taking
p1 − p2 , or

dist(p1 , p2 ) = (x1 − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2 .
If we scale a vector v by 1/v, we end up with a vector of magnitude 1, or a

unit vector. This is often represented by v̂.
i i
i i
i i
i i
Figure 1.3. Projection of one vector onto another.
1.2.3 Dot Product

The dot product of two vectors a and b is defined as
a · b = ab cos θ, (1.1)
where θ is the angle between a and b.

For two vectors using a standard Euclidean basis, this can be represented as
a · b = ax b x + ay b y + az b z .
There are two uses of this that are of particular interest to game physics de-
velopers. First of all, it can be used to do simple tests of the angle between two
vectors. If a · b > 0, then θ < π/2; if a · b < 0, then θ > π/2; and if a · b = 0,
then θ = π/2. In the latter case, we also say that the two vectors are orthogonal.
The other main use of the dot product is for projecting one vector onto another.
If we have two vectors a and b, we can break a into two pieces a|| and a⊥ such
that a|| + a⊥ = a and a|| points along the same direction as, or is parallel to, b
(see Figure 1.3). The vector a|| is also known as the scalar projection of a onto b.
From Equation (1.1), if b = 1, then a · b is simply a cos θ, which we
can see from Figure 1.3 is the length of the projection of a onto b. The projected
vector itself can be computed as
a|| = (a · b)b.
The remaining, or orthogonal portion of a can be computed as
a⊥ = a − a|| .
1.2.4 Cross Product

The cross product of two vectors a and b is defined as
a × b = (ay bz − az by , az bx − ax bz , ax by − ay bx ).
i i
i i
i i
i i
1.2. Vectors and Points 7
This produces a vector orthogonal to both a and b. The magnitude of the cross
product is
a × b = ab sin θ,
where θ is the angle between a and b. The direction of the cross product is
determined by the right-hand rule: taking your right hand, point the first finger in
the direction of a and the middle finger along b. Your extended thumb will point
along the cross product.
Two useful identities to be aware of are the anticommutativity and bilinearity
of the cross product:
a×b = −b × a,
a × (sb + tc) = s(a × b) + t(a × c).
1.2.5 Triple Product

There are two possible triple products for vectors. The first uses both the dot
product and the cross product and produces a scalar result. Hence it is known as
the scalar triple product:
s = a · (b × c).
The scalar triple product measures the signed volume of the parallelepiped bounded
by the three vectors a, b, and c. Thus, the following identity holds:
a · (b × c) = b · (c × a) = c · (a × b).
The second triple product uses only the cross product and produces a vector result.
It is known as the vector triple product:
v = a × (b × c).
The vector triple product is useful for creating an orthogonal basis from linearly
independent vectors. One example basis is b, b × c, and b × (b × c).
The following relationship between the vector triple product and dot product
is also helpful in derivations for rigid-body dynamics and geometric algorithms:
a × (b × c) = (a · c)b − (a · b)c.
1.2.6 Derivatives
We mentioned that vectors can act to represent rate of change. In particular, a
vector-valued function is the derivative of a point-valued function. If we take the
standard equation for a derivative of a function as in
x(t + h) − x(t)
x (t) = lim ,
h→0 h
i i
i i
i i
i i
we can see that the result x (t) will be a vector-valued function, as we are sub-
tracting two points and then scaling by 1/h. It can be similarly shown that the
derivative of a vector-valued function is a vector-valued function. Note that we
often write such a time derivative as simply ẋ.
1.3 Lines and Planes

1.3.1 Definitions
If we parameterize an affine combination, we can create new entities: lines and
planes. A line can be represented as a point plus a parameterized vector:
l(t) = x + tv.
Similarly, a plane in R3 can be represented as a point plus two parameterized
vectors:
p(s, t) = x + su + tv.
An alternative definition of a plane is to take a vector n and a point p0 and
state that for any given point p on the plane,
0 = n · (p − p0 ).
If we set (a, b, c) = n, (x, y, z) = p, and d = −n · (p0 − o), we can rewrite this
as
0 = ax + by + cz + d, (1.2)
which should be a familiar formula for a plane.
For an arbitrary point p, we can substitute p into Equation (1.2) to test whether
it is on one side or another of the plane. If the result is greater than zero, we know
the point is on one side, if less than zero, it is on the other. And if the result is
close to zero, we know that the point is close to the plane.
We can further restrict our affine combinations to create half-infinite or fully
finite entities. For example, in our line equation, if we constrain t ≥ 0, we get a
ray. If we restrict t to lie between 0 and 1, then we have a line segment. We can
rewrite the line equation in an alternate form to make it clearer:
S(t) = (1 − t)x0 + tx1 .
In this case, x0 and x1 are the two endpoints of the line segment.
We can perform a similar operation with three points to create a triangle:
T(s, t) = (1 − s − t)x0 + sx1 + tx2 ,
where, again, s and t are constrained to lie between 0 and 1.
i i
i i
i i
i i
1.4. Matrices and Transformations 9
1.4 Matrices and Transformations

1.4.1 Definition
A matrix is an m × n array of components with m rows and n columns. These
components could be complex numbers, vectors, or even other matrices, but most
of the time when we refer to a matrix, its components are real numbers. An
example of a 2 × 3 matrix is

5 −1 0
.
12 0 −10
We refer to a single element in the ith row and jth column of the matrix A as aij .
Those elements where i = j are the diagonal of the matrix.
A matrix whose elements below and to the left of the diagonal (i.e., those
where i > j) are 0 is called an upper triangular matrix. Similarly, a matrix
whose elements above and to the right of the diagonal (i.e., those where i < j)
are 0 is called a lower triangular matrix. And those where all the non-diagonal
elements are 0 are called diagonal matrices.
A matrix is called symmetric if, for all i and j, the elements aij = aji , i.e., it
is mirrored across the diagonal. A matrix is skew symmetric if for all i and j the
elements aij = −aji . Clearly, the diagonal elements must be 0 in this case.
1.4.2 Basic Operations

Matrices can be added and scaled like vectors:
C = A + B,
D = kA.
In the first case, each element cij = aij + bij , and in the second, dij = kaij .
Matrices can be transposed by swapping elements across the diagonal, i.e.,
a matrix G is the transpose of matrix A if for all i and j, gij = aji . This is
represented as
G = AT .
Finally, matrices can be multiplied:
H = AB.
Here, for a given element hij , we take the corresponding row i from A and cor-
responding column j from B, multiply them component-wise, and take the sum,
or
hij = aik bkj .
k
i i
i i
i i
i i
Note also that matrix multiplication is noncommutative. That is, we cannot

say in general that AB = BA.
1.4.3 Vector Representation and Transformation

We can represent a vector as a matrix with one column, e.g.,
⎡ ⎤
x1
⎢ x2 ⎥
⎢ ⎥
x = ⎢ . ⎥,
⎣ .. ⎦
xn
or with one row, e.g.,

bT = b1 b2 ··· bm .
In this book, we will be using column matrices to represent vectors. Should

we want to represent a row matrix, we shall use the transpose, as above. Using
this notation, we can also represent a matrix as its component columns:

A= a1 a2 ··· an .
A linear transformation T is a mapping that preserves the linear properties of

scale and addition; that is, for two vectors x and y,
aT (x) + T (y) = T (ax + y).
We can use matrices to represent linear transformations. Multiplying a vector

x by an appropriately sized matrix A, and expanding the terms, we get
⎡ ⎤ ⎡ ⎤⎡ ⎤
b1 a11 a12 · · · a1n x1
⎢ b2 ⎥ ⎢ a21 a22 · · · a2n ⎥ ⎢ x2 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ .. ⎥ = ⎢ .. .. .. .. ⎥ ⎢ .. ⎥ .
⎣ . ⎦ ⎣ . . . . ⎦⎣ . ⎦
bm am1 am2 ··· amn xn
This represents a linear transformation T from an n-dimensional space to an m-

dimensional space. If we assume that both spaces use the standard Euclidean
bases e1 , e2 , . . . , en and e1 , e2 , . . . , em , respectively, then the column vectors in
matrix A are the transformed basis vectors T (e1 ), T (e2 ), . . . , T (en ).
Multiplying transformation matrices together creates a single matrix that rep-
resents the composition of the matrices’ respective transformations. In this way,
we can represent a composition of linear transformations in a single matrix.
i i
i i
i i
i i
1.4. Matrices and Transformations 11
1.4.4 Inverse and Identity

Just as we can multiply a scalar by 1 to no effect, there is an identity transforma-
tion that produces the original vector. This is represented by the matrix E, which
is a square diagonal matrix, sized appropriately to perform the multiplication on
the vector and with all 1s on the diagonal. For example, the following will work
for vectors in R3 : ⎡ ⎤
1 0 0
E = ⎣ 0 1 0 ⎦.
0 0 1
Intuitively, this makes sense. If we examine the columns, we will see they are just
e1 , e2 , and e3 , thereby transforming the basis vectors to themselves.
Note that the identity matrix is often represented in other texts as I. We are
using E to distinguish it from the inertial tensor, as discussed below.
The equivalent to standard division is the inverse. The inverse reverses the
effect of a given transformation, as represented by the following:
x = A−1 Ax.
However, just as we can’t divide by 0, we can’t always find an inverse for a

transformation. First, only transformations from an n-dimensional space to an
n-dimensional space have inverses. And of those, not all of them can be inverted.
For example, the transformation T (x) = 0 has no inverse.
Discussing how to invert matrices in a general manner is out of the scope of
this chapter; it is recommended that the reader see [Anton and Rorres 94], [Golub
and Van Loan 93], or [Press et al. 93] for more information.
1.4.5 Affine Transformations

An affine transformation on a point x performs the basic operation
z = Ax + y,
where A and y are a matrix and vector, respectively, of the appropriate sizes to
perform the operation. We can also represent this as a matrix calculation:

z A y x
= .
1 0T 1 1
In general, in physical simulations, we are concerned with two affine transfor-

mations: translation (changing position) and rotation (changing orientation). (See
Figure 1.4.)
i i
i i
i i
i i
Figure 1.4. Translation and rotation.
The affine transformation will end up adding the vector y to any point we
apply it to, so y achieves translation for us. Rotation is stored in the matrix A.
Because it is for us convenient to keep them separate, we will use the first form
more often. So in three dimensions, translation will be stored as a 3-vector t and
rotation as a 3 × 3 matrix, which we will call R.
The following equation, also known as the Rodrigues formula, performs a
general rotation of a point p by θ radians around a rotation axis r̂:
cos θp + [1 − cos θ](r̂ · p)r̂ + sin θ(r̂ × p). (1.3)
This can be represented as a matrix by

⎡ ⎤
tx2 + c txy − sz txz + sy
Rr̂θ = ⎣ txy + sz ty 2 + c tyz − sx ⎦ ,
txz − sy tyz + sx tz 2 + c
where
r̂ = (x, y, z),
c = cos θ,
s = sin θ,
t = 1 − cos θ.
Both translation and rotation are invertible transformations. To invert a trans-

lation, simply add −y. To invert a rotation, take the transpose of the matrix.
One useful property of rotation is its interaction with the cross product:
R(a × b) = Ra × Rb.
Note that this does not hold true for all linear transformations.
i i
i i
i i
i i
1.5. Quaternions 13
1.5 Quaternions
1.5.1 Definition
Another useful rotation representation is the quaternion. In their most general
form, quaternions are an extension of complex numbers. Recall that a complex
number can be represented as
c = a + bi,
where i2 = −1.
We can extend this to a quaternion by creating two more imaginary terms, or
q = w + xi + yj + zk,
where i2 = j 2 = k 2 = ijk = −1. All of a quaternion’s properties follow from

this definition. Since i, j, and k are constant, we can also write this as an ordered
4-tuple, much as we do vectors:
q = (w, x, y, z).
Due to the properties of xi + yj + zk, the imaginary part of a quaternion is

often referred to as a vector in the following notation:
q = (w, v).
Using the vector form makes manipulating quaternions easier for those who are
familiar with vector operations.
Note that most software packages store a quaternion as (x, y, z, w), which
matches the standard layout for vertex positions in graphics.
1.5.2 Basic Operations

Like vectors, quaternions can be scaled and added, as follows:
aq = (aw, av),
q0 + q1 = (w0 + w1 , q0 + q1 ).
There is only one quaternion multiplication operation. In vector form, this is

represented as
q0 q1 = (w0 w1 − v0 · v1 , w0 v1 + w1 v0 + v0 × v1 ).
Note that due to the cross product, quaternion multiplication is noncommutative.
i i
i i
i i
i i
Quaternions, like vectors, have a magnitude:

q = w2 + v · v = w2 + x2 + y 2 + z 2 .
Quaternions of magnitude 1, or unit quaternions, have properties that make them

useful for representing rotations.
Like matrices, quaternions have a multiplicative identity, which is (1, 0). There
is also the notion of a multiplicative inverse. For a unit quaternion (w, v), the in-
verse is equal to (w, −v). We can think of this as rotating around the opposing
axis to produce the opposite rotation. In general, the quaternion inverse is
1
q−1 = (w, −v).
w2 + x2 + y2 + z 2
1.5.3 Vector Rotation

If we consider a rotation of angle θ around an axis r, we can write this as a
quaternion:
q = (cos(θ/2), sin(θ/2)r̂).
It can be shown that this is, in fact, a unit quaternion.
We can use a quaternion of this form to rotate a vector p around r̂ by θ by
using the formulation
prot = qpq−1 .
Note that in order to perform this multiplication, we need to rewrite p as a quater-
nion with a zero-valued w term, or (0, p).
This multiplication can be expanded out and simplified as
prot = cos θp + [1 − cos θ](r̂ · p)r̂ + sin θ(r̂ × p),
which as we see is the same as Equation (1.3) and demonstrates that quaternions
can be used for rotation.
1.5.4 Matrix Conversion

It is often useful to convert a quaternion to a rotation matrix, e.g., so it can be
used with the graphics pipeline. Again, assuming a unit rotation quaternion, the
following is the corresponding matrix:
⎡ ⎤
1 − 2y 2 − 2z 2 2xy − 2wz 2xz + 2wy
Rq = ⎣ 2xy + 2wz 1 − 2x2 − 2z 2 2yz − 2wx ⎦ .
2xz − 2wy 2yz + 2wx 1 − 2x2 − 2y 2
i i
i i
i i
i i
1.6. Rigid-Body Dynamics 15
Figure 1.5. Space curve with position and velocity at time t.
1.6 Rigid-Body Dynamics

1.6.1 Constant Forces
Suppose we have an object in motion in space. For the moment, we will consider
only a particle with position x, or linear motion. If we track this position over
time, we end up with a function x(t). In addition, we can consider at a particular
time how fast the object is moving and in what direction. This is the velocity
v(t). As the velocity describes how x changes in time, it is also the derivative of
its position, or ẋ. (See Figure 1.5.)
Assuming that the velocity v is constant, we can create a formula for com-
puting the future position of an object from its current position x0 and the time
traveled t:
x(t) = x0 + vt.
However, most of the time, velocity is not constant, and we need to consider its
derivative, or acceleration a. Assuming a is constant, we can create a similar
formula for v(t):
v(t) = v0 + at.
Since velocity is changing at a linear rate, we can substitute the average of the
velocities across our time steps for v in our original equation:

1
x(t) = x0 + t (v0 + v(t))
2

1
= x0 + t (v0 + v0 + at)
2
1
= x0 + v0 t + at2 . (1.4)
2
Acceleration in turn is derived from a vector quantity known as a force F.
Forces act to push and pull an object around in space. We determine the acceler-
ation from force by using Newton’s second law of motion,
F = ma,
where m is the mass of the object and is constant.
i i
i i
i i
i i
The standard example of a force is gravity, Fgrav = mg, which draws us to

the Earth. There is also the normal force that counteracts gravity and keeps us
from sinking through the ground. The thrust of a rocket, an engine moving a car
along—these are all forces.
There can be multiple forces acting on an object. To manage these, we take
the sum of all forces on an object and treat the result as a single force in our
equations:
F= Fj .
j
1.6.2 Nonconstant Forces

Equation (1.4) is suitable when our forces are constant across the time interval we
are considering. However, in many cases, our forces are dependent on position or
velocity. For example, we can represent a spring force based on position,
Fspring = −kx,
or a drag force based on velocity,
Fdrag = −mρv.
And as position and velocity will be changing across our time interval, our forces
will as well.
One solution is to try and find a closed analytical solution, but (a) such a
solution may not be possible to find and (b) the solution may be so complex
that it is impractical to compute every frame. In addition, this constrains us to a
single set of forces for that solution, and we would like the flexibility to apply and
remove forces at will.
Instead, we will use a numerical solution. The problem we are trying to solve
is this: we have a physical simulation with a total force dependent generally on
time, position, and velocity, which we will represent as F(t, x, v). We have a
position x(t) = x0 and a starting velocity v(t) = v0 . The question is, what is
x(t + h)?
One solution to this problem is to look at the definition of a derivative. Recall
that
x(t + h) − x(t)
x (t) = lim .
h→0 h
For the moment, we will assume that h is sufficiently small and obtain an approx-
imation by treating h as our time step.
Rearranging terms, we get
.
x(t + h) = x(t) + hx (t),
i i
i i
i i
i i
or
.
x(t + h) = x(t) + hv(t).
This is known as the explicit Euler’s method. Another way of thinking of this
is that the derivative is tangent to the curve of x(t) at time t. By taking a small
enough step in the tangent direction, we should end up close to the actual solution.
Note that since we are taking a new time step each frame, the frame positions
are often represented in terms of a sequence of approximations x0 , x1 , x2 , . . . So
an alternative form for Euler’s method is
xi+1 = xi + hxi .
Including the update for velocity, our full set of simulation equations is
vi+1 = vi + hF(ti , xi , vi )/m,

xi+1 = xi + hvi+1 .
Note that we use the result of the velocity step in our position equation. This is
a variant of the standard Euler known as symplectic Euler, which provides more
stability for position-based forces. We will discuss symplectic Euler and other
integration methods below in more detail.
1.6.3 Updating Orientation

Updating orientation for a rigid-body simulation is similar to, yet different from,
updating position. In addition to the linear quantities, we now have an object with
the last frame’s orientation Ri or qi , the last frame’s angular velocity vector ωi ,
an inertial tensor I, and a sum of torques τ . From that, we wish to calculate the
current frame’s orientation Ri+1 or qi+1 and the current frame’s angular velocity
ωi+1 .
The orientation itself we represent with either a rotation matrix R or a quater-
nion q, both encapsulating rotation from a reference orientation (much as we can
use a vector from the origin to represent a point). Which form we use depends
on our needs. For example, rotation matrices can be convenient because they are
easily converted into a form efficient for rendering. However, quaternions take up
less space and need fewer operations to update and, thus, can be more efficient in
the simulation engine itself.
Angular velocity is the rotational correspondence to linear velocity. As lin-
ear velocity represents a change in position, angular velocity represents a change
in orientation. Its form is a three-element vector pointing along the axis of ro-
tation and scaled so that its magnitude is the angle of rotation, in radians. We
i i
i i
i i
i i
Figure 1.6. Converting between angular and linear velocities.
can determine the linear velocity at a displacement r from the center of rotation
(Figure 1.6) using the following equation:
v = ω × r. (1.5)
If the object is also moving with a linear velocity vl , this becomes
v = vl + ω × r.
The inertial tensor I is the rotational equivalent to mass. Rather than the single
scalar value of mass, the inertial tensor is a 3 × 3 matrix. This is because the
shape and density of an object affects how it rotates. For example, consider a
skater doing a spin. If she draws her arms in, her angular velocity increases. So
by changing her shape, she is changing her rotational dynamics.
Computing the inertial tensor for an object is not always easy. Often, we can
approximate it by using the inertial tensor for a simpler shape. For example, we
could use a box to approximate a car or a cylinder to approximate a statue. If
we want a more accurate representation, we can assume a constant density object
and compute it based on the tessellated geometry. One way to think of this is as
the sum of tetrahedra, where each tetrahedron shares a common vertex with the
others, and the other vertices are one face of the original geometry. As the inertial
tensor for a tetrahedron is a known quantity, this is a relatively straightforward
calculation [Kallay 06]. A quantity that has no linear complement is the center
of mass. This is a point, relative to the object, where applying a force invokes
no rotation. We can think of this as the perfect balance point. The placement of
the center of mass varies with the density or shape of an object. So a uniformly
dense and symmetric steel bar will have its center of mass at its geometric cen-
ter, whereas a hammer, for example, has its center of mass closer to its head.
Placement of the center of mass can be done in a data-driven way by artists or
designers, but more often, it comes out of the same calculation that computes the
inertial tensor.
i i
i i
i i
i i
The final quantity is torque, which is the rotational equivalent to force. Ap-
plying force to an object at any place other than its center of mass will generate
torque. To compute the torque, we take a vector r from the center of mass to the
point where the force is applied and perform a cross product as follows:
τ = r × F.
This will apply the torque counterclockwise around the vector direction, as per
the right-hand rule. We can sum all torques to determine the total torque on an
object:
τtot = rj × Fj .
j
As with force, we can use Newton’s second law to find the relationship be-
tween torque and angular acceleration α:
τ = Iα.
1.6.4 Numerical Integration for Orientation Using Matrices

To update our orientation, we ideally would want to do something like this:
Ri+1 = Ri + hωi .
However, as Ri is a matrix and ωi is a vector, this is not possible. Instead, we do

the following:
Ri+1 = Ri + h[ω]× i Ri ,
where ⎡ ⎤
0 −ω3 ω2
[ω]× = ⎣ ω3 0 −ω1 ⎦ .
−ω2 ω1 0
To understand why, let us consider the basis vectors of the rotation matrix R and
how they change when an infinitesimal angular velocity is applied. For simplic-
ity’s sake, let us assume that the angular velocity is applied along one of the basis
vectors; Figure 1.7 shows the other two. Recall that the derivative is a linear
quantity, whereas angular velocity is a rotational quantity. What we need to do is
change the rotational change of each axis to a linear change. We can do this by
computing the infinitesimal linear velocity at the tip of a given basic vector and
then adding this to get the new basis vector.
Recall that Equation (1.5) gives the linear velocity at a displacement r for
angular velocity ω. So for each basis vector rj , we could compute ω × rj and,
i i
i i
i i
i i
Figure 1.7. Change in basis vectors due to angular velocity.
from that, create a differential rotation matrix. However, there is another way to
do a cross product and that is to use a skew symmetric matrix of the appropriate
form, which is just what [ω]× is. Multiplying rj by the skew symmetric matrix
[ω]× will perform the cross product ω × rj , and multiplying R by [ω]× will
perform the cross product on all the basis vectors as a single operation, giving us
our desired result of dR/dt.
1.6.5 Numerical Integration for Orientation Using

Quaternions
Performing the Euler step for quaternions is similar to matrices. Again, we use an
equation that can turn our angular velocity vector into a form suitable for adding
to a quaternion:
h
qi+1 = q + wq,
2
where w is a quaternion of the form
w = (0, ω).
There are a number of proofs for this, though none are as intuitive as the one
for rotation matrices. The most straightforward is from [Hanson 06]. If we take a
quaternion q to the t power, we find that
qt = exp(t log q).
For a rotation quaternion,

θ
log q = 0, r̂ ,
2
and hence,

θ
exp(t log q) = exp 0, t r̂
2

tθ tθ
= cos , sin r̂ .
2 2
i i
i i
i i
i i
Taking the derivative of qt with respect to t gives us
dqt d exp(t log q)

=
dt dt
= log q exp(t log q)
= (log q)qt .
At t = 0, this is just
dq
= log q
dt
θ
= 0, r̂ .
2
1
Pulling out the 2 term, we get
1 1
(0, θr̂) = w.
2 2
Multiplying this quantity by the quaternion q gives the change relative to q, just
as it did for matrices.
1.6.6 Numerical Integration for Angular Velocity

As angular velocity and torque/angular acceleration are both vectors, we might
think we could perform the followoing:
ωi+1 = ωi + hI−1 τ.
However, as
τ = Iω̇ + ω × Iω,
we cannot simply multiply τ by the inverse of I and do the Euler step.
One solution is to ignore the ω ×Iω term and perform the Euler step as written
anyway. This term represents the precession of the system—for example, a tipped,
spinning top will spin about its local axis but will also slowly precess around its
vertical axis as well. Removing this term will not be strictly accurate but can add
some stability.
The alternative is to do the integration in a different way. Consider the angular
momentum L instead, which is Iω. The derivative L̇ = Iω̇ = Iα = τ . Hence we
can do the following:
Li+1 = Li + hτ,
ωi+1 = I−1
i Li+1 .
i i
i i
i i
i i
The final piece is the calculation of I−1

i . The problem is that I is calculated
relative to the object, but the remaining quantities are computed relative to the
world. The solution is to update I each time step based on its current orientation
thusly:
I−1 −1 −1
i Li+1 = Ri I0 Ri Li+1 .
We can think of this as rotating the angular momentum vector into the object’s
local orientation, applying the inverse inertial tensor, and then rotating back into
world coordinates.
This gives us our final formulas:

τ = rk × Fk ,
k
Li+1 = Li + hτ,
I−1
i = Ri I−1 −1
0 Ri ,
ωi+1 = I−1
i Li+1 ,
Ri+1 = Ri + hωi+1 .
1.7 Numerical Integration

1.7.1 Issues with Euler’s Method
Euler’s method has the advantage of simplicity, however, it has its problems. First
of all, it assumes that the derivative at the current point is a good estimate of
the derivative across the entire interval. Secondly, the approximation that Euler’s
method produces adds energy to the system. And this approximation error is
propagated with each Euler step. This leads to problems with stability if our
system oscillates, such as with springs, orbits, and pendulums, or if our time step
is large. In either case, the end result is that our approximation becomes less and
less accurate.
We can see an example of this by looking at Euler’s method used to simulate
an orbiting object (Figure 1.8). The first time step clearly takes us off the desired
path, and each successive step only makes things worse. We see similar problems
with so-called “stiff” equations, e.g., those used to simulate stiff springs (hence
the name).
Recall that the definition of the derivative assumes that h is infinitesimally
small. So one solution might be to decrease our time step: e.g., divide our time in
half and take two steps. While this can help in some situations (and some physics
i i
i i
i i
i i
1.7. Numerical Integration 23
Figure 1.8. Using Euler’s method to approximate an orbit.
engines do just that for that reason), because of the nature of Euler’s method the
error will still accumulate.
1.7.2 Higher-Order Explicit Methods

One solution to this problem is to realize that we are trying to approximate a non-
linear function with a linear function. If we take a weighted average of samples
of the derivative across our interval, perhaps we can construct a better approxi-
mation. The higher-order Runge-Kutta methods do just this. The most notable
example is Runge-Kutta Order 4, or just RK4, which takes four samples of the
derivative.
In general, RK4 will provide a better approximation of the function. However,
it does come with the cost of more invocations of the derivative function, which
may be expensive. In addition, it still does not solve our problem with stiff equa-
tions. For particularly stiff equations, RK4 will still add enough energy into the
system to cause it to spiral out of control. Fortunately, there are other possibilities.
1.7.3 Implicit Methods

One method uses an alternative definition of the derivative:
x(t) − x(t − h)
x (t) = lim .
h→0 h
If we assume small h and again rearrange terms, we get
.
x(t) = x(t − h) + hx (t).
Substituting t + h for t, we end up with

.
x(t + h) = x(t) + hx (t + h).
i i
i i
i i
i i
This is known as the implicit Euler method. The distinction between the implicit
and explicit methods is that with the implicit methods, the right side includes
terms that are not yet known. Implicit Euler is a first-order implicit method—it is
possible to create higher-order methods just as we did for explicit methods.
Whereas explicit methods add energy to the system as they drift away from
the actual function, implicit methods remove energy from the system. So while
implicit methods still do not handle oscillating or stiff equations perfectly, they
do not end up oscillating out of control. Instead, the system will damp down
much faster than expected. The solution converges, which is not ideal, but does
maintain stability.
We do have the problem that x (t + h) is unknown. There are three possible
ways to solve this. One is to try to solve for an analytic solution. However,
as before, this is not always possible, and often we do not have an equation for
x (t)—it is a function we call in our simulator that returns a numeric result. That
result could be computed from any number of combinations of other equations.
So, for both reasons, it is usually not practical to compute an explicit solution. In
this case, we have two choices.
The first is to compute x(t + h) using an explicit method and then use the
result to compute our implicit function. This is known as a predictor-corrector
method, as we predict a solution using the explicit equation and then correct for
errors using the implicit solution. An example of this is using the result of an
explicit Euler step in a modified implicit Euler solution:
x̃i+1 = xi + hvi ,
ṽi+1 = vi + hF(ti , xi , vi )/m,
h
xi+1 = xi + (ṽi+1 + vi ),
2
h
vi+1 = vi + (F(t̃i+1 , x̃i+1 , ṽi+1 ) + F(ti , xi , vi ))/m.
2
An alternative method for implicit Euler is to treat it as a linear equation and
solve for it. We can do this for a force dependent on position as follows:
xx+1 = xi + hi xi+1 ,
xi + Δxi = xi + hi F(xi + Δxi ),
Δxi = hi F(xi + Δxi ),
Δxi ≈ hi (F(xi ) + J(xi )Δxi ),
−1
1
Δxi ≈ E − J(xi ) F(xi ),
hi
i i
i i
i i
i i
1.7. Numerical Integration 25
where J is a matrix of partial derivatives known as the Jacobian. The resulting

matrix is sparse and easy to invert, which makes it useful for large systems, such
as collections of particles.
1.7.4 Verlet Methods

A popular game physics method, mainly due to [Jakobsen 01], is Verlet integra-
tion. In its most basic form, it is a velocity-less scheme, instead using the position
from the previous frame. As we often don’t care about the velocity of particles,
this makes it very useful for particle systems.
The general formula for the Verlet method is as follows:
xi+1 = 2xi − xi−1 + h2 ai .
While standard Verlet is quite stable, it has the disadvantage that it doesn’t
incorporate velocity. This makes it difficult to use with velocity-dependent forces.
One possible solution is to use Leapfrog Verlet:
vi+1/2 = vi−1/2 + hai ,

xi+1 = xi + hvi+1/2 .
However, this does not compute the velocity at the current time step, but in-
stead at the half-time step (this is initialized by using a half-interval Euler step).
While we can take an average of these over two time steps for our force calcula-
tion, we still have problems with impulse-based collision systems, which instan-
taneously modify velocity to simulate contact forces. One solution to this is use
the full velocity Verlet:
vi+1/2 = vi + h/2ai ,
xi+1 = xi + hvi+1/2 ,
vi+1 = vi+1/2 + h/2ai+1 .
However, unlike Euler’s method, this does require two force calculations, and we
can get similar stability with the last method we’ll consider.
More information on Verlet methods can be found in Chapter 11.
1.7.5 Symplectic Euler Method

We’ve already seen the symplectic Euler method previously—in fact, it’s the
method we were using for the simulation equations in Section 1.6. It is a semi-
implicit method, in that it uses the explicit Euler method to update velocity but
i i
i i
i i
i i
Figure 1.9. Using the symplectic Euler method to approximate an orbit.
uses an implicit value of velocity to update position:
vi+1 = vi + hF(ti , xi , vi )/m,

xi+1 = xi + hvi+1 .
This takes advantage of the fact that velocity is the derivative of position, and
the end result is that we get a very stable method that only requires one force
calculation. It does have the disadvantage that it is not as accurate with constant
forces, but in those cases, we should consider using Equation (1.4) anyway.
In Figure 1.9, we see the result of using symplectic Euler with one step of our
orbit example. Admittedly this is a bit contrived, but we see that, in principle,
it is extremely stable—neither spiraling outward as explicit Euler would do nor
spiraling inward as implicit Euler would do.
1.8 Further Reading

This chapter is mainly intended as an overview, and the interested reader can find
more details in a wide variety of sources. Good references for linear algebra with
widely varying but useful approaches are [Anton and Rorres 94] and [Axler 97].
Kenneth Joy also has a good series on vectors, points, and affine transformations,
found in [Joy 00c], [Joy 00b], and [Joy 00a].
The standard quaternion reference for graphics is [Shoemake 85], which has
been expanded to excellent detail in [Hanson 06]. An early series of articles
about game physics is [Hecker 97], and [Witkin and Baraff 01] provides thorough
coverage of the early Pixar physics engine. It is also worth mentioning [Catto 06],
which first introduced me to the symplectic Euler method, for which I am eternally
grateful.
Finally, without modesty, a good general source for all of these topics is my
own work, cowritten with Lars Bishop [Van Verth and Bishop 08].
i i
i i
i i
i i
Bibliography 27
Bibliography
[Anton and Rorres 94] Howard Anton and Chris Rorres. Elementary Linear Al-
gebra: Applications Version, Seventh edition. New York: John Wiley and
Sons, 1994.
[Axler 97] Sheldon Axler. Linear Algebra Done Right, Second edition. New
York: Springer, 1997.
[Catto 06] Erin Catto. “Fast and Simple Physics using Sequential Impulses.”
Paper presented at GDC 2006 Tutorial “Physics for Game Programmers,”
San Jose, CA, March, 2006.
[Golub and Van Loan 93] Gene H. Golub and Charles F. Van Loan. Matrix Com-
putations. Baltimore, MD: Johns Hopkins University Press, 1993.
[Hanson 06] Andrew Hanson. Visualizing Quaternions. San Francisco: Morgan

Kaufmann, 2006.
[Hecker 97] Chris Hecker. “Behind the Screen: Physics.” Series published in
Game Developer Magazine, 1996–1997.
[Jakobsen 01] Thomas Jakobsen. “Advanced Character Physics.” Paper pre-

sented at Game Developers Conference 2001, San Jose, CA, March, 2001.
[Joy 00a] Kenneth Joy. “On-Line Geometric Modeling Notes: Affine Combina-
tions, Barycentric Coordinates and Convex Combinations.” Technical re-
port, University of California, Davis, 2000.
[Joy 00b] Kenneth Joy. “On-Line Geometric Modeling Notes: Points and Vec-
tors.” Technical report, University of California, Davis, 2000.
[Joy 00c] Kenneth Joy. “On-Line Geometric Modeling Notes: Vector Spaces.”
Technical report, University of California, Davis, 2000.
[Kallay 06] Michael Kallay. “Computing the Moment of Inertia of a Solid De-
fined by a Triangle Mesh.” journal of graphics tools 11:2 (2006), 51–57.
[Press et al. 93] William H. Press, Brian P. Flannery, Saul A. Teukolsky, and
William T. Vetterling. Numerical Recipes in C: The Art of Scientific Com-
puting, Second edition. New York: Cambridge University Press, 1993.
[Shoemake 85] Ken Shoemake. “Animating Rotation with Quaternion Curves.”

Computer Graphics (SIGGRAPH ’85 Proceedings) 19 (1985), 245–254.
i i
i i
i i
i i
28 Bibliography
[Van Verth and Bishop 08] James M. Van Verth and Lars M. Bishop. Essential
Mathematics for Games and Interactive Applications, Second edition. San
Francisco: Morgan Kaufmann, 2008.
[Witkin and Baraff 01] Andrew Witkin and David Baraff. “Physically Based
Modelling: Principles and Practice.” ACM SIGGRAPH 2001 Course Notes.
Available at http://www.pixar.com/companyinfo/research/pbm2001/, 2001.
i i
i i
i i
i i
-2-
Understanding
Game Physics Artifacts
Dennis Gustafsson
2.1 Introduction
Physics engines are known for being notoriously hard to debug. For most people,
physics artifacts are just a seemingly random stream of weird behavior that makes
no sense. Few components of a game engine cause much frustration and hair
loss. We have all seen ragdolls doing the funky monkey dance and stacks of
“rigid” bodies acting more like a tower of greasy mushrooms, eventually falling
over or taking off into the stratosphere. This chapter will help you understand the
underlying causes of this behavior and common mistakes that lead to it. Some of
them can be fixed, some of them can be worked around, and some of them we will
just have to live with for now. This is mostly written for people writing a physics
engine of their own, but understanding the underlying mechanisms is helpful even
if you are using an off-the-shelf product.
2.2 Discretization and Linearization

Physics engines advance time in discrete steps, typically about 17 ms for a 60 Hz
update frequency. It is not uncommon to split up the time step into smaller steps,
say two or three updates per frame (often called substepping) or even more, but no
matter how small of a time step you use, it will still be a discretization of a con-
tinuous problem. Real-world physics do not move in steps, not even small steps,
but in a continuous motion. This is by far the number one source for physics ar-
tifacts, and any properly implemented physics engine should behave better with
more substeps. If a physics artifact does not go away with more substeps, there
is most likely something wrong with your code. The bullet-through-paper prob-
lem illustrated in Figure 2.1 is a typical example of a problem that is caused by
discretization.
29
i i
i i
i i
i i
30 2. Understanding Game Physics Artifacts
1 2 3
Figure 2.1. Discretization can cause fast-moving objects to travel through walls.
Another big source of artifacts is the linearization that most physics engines
employ—the assumption that during the time step everything travels in a linear
motion. For particle physics, this is a pretty good approximation, but as soon as
you introduce rigid bodies and rotation, it falls flat to the ground. Consider the ball
joint illustrated in Figure 2.2. The two bodies are rotating in opposite directions.
At this particular point in time, the two bodies are lined up as shown. Even if
the solver manages to entirely solve the relative velocity at the joint-attachment
point to zero, as soon as time is advanced, no matter how small the amount, the
two attachment points will drift apart. This is the fundamental of linearization,
which makes it impossible to create an accurate physics engine by solving just for
relative linear velocities at discrete points in time.
Even though linearization and discretization are two different approximations,
they are somewhat interconnected. Lowering the step size (increasing the number
of substeps) will always make linearization less problematic, since any nonlinear
motion will appear more and more linear the shorter the time span. The ambitious
reader can make a parallel here to the Heisenberg principle of uncertainty!
The major takeaway here is that as long as a physics engine employs dis-
cretization and linearization, which all modern physics engines and all algorithms
1 2
Figure 2.2. Even if relative linear velocity at the joint attachment is zero, objects can
separate during integration due to rotation.
i i
i i
i i
i i
2.3. Time Stepping and the Well of Despair 31
and examples in this book do, there will always be artifacts. These artifacts are
not results of a problem with the physics engine itself, but the assumptions and
approximations the engine is built upon. This is important to realize, because
once you accept the artifacts and understand their underlying causes, it makes
them easier to deal with and work around.
2.3 Time Stepping and the Well of Despair

Since the physics engine is advanced in discrete steps, what happens if the game
drops a frame? This is a common source of confusion when integrating a physics
engine, since you probably want the motion in your game to be independent of
frame rate. On a slow machine, or in the occasion of your modern operating sys-
tem going off to index the quicksearch database in the middle of a mission, the
graphical update might not keep up with the desired frequency. There are several
different strategies for how to handle such a scenario from a physics perspective.
You can ignore the fact that a frame was dropped and keep stepping the normal
step length, which will create a slow-motion effect that is usually highly unde-
sirable. Another option is to take a larger time step, which will create a more
realistic path of motion but may introduce jerkiness due to the variation in dis-
cretization. The third option is to take several, equally sized physics steps. This
option is more desirable, as it avoids the slowdown while still doing fixed-size
time steps.
2.3.1 The Well of Despair

Making several physics updates per frame usually works fine, unless the physics
is what is causing the slowdown to begin with. If physics is the bottleneck, the
update frequency will go into the well of despair, meaning every subsequent frame
needs more physics updates, causing a slower update frequency, resulting in even
more physics updates the next frame, and so on. There is unfortunately no way
to solve this problem other than to optimize the physics engine or simplify the
problem, so what most people do is put a cap on the number of physics updates
per frame, above which the simulation will simply run in slow motion. Actually,
it will not only run in slow motion but it will run in slow motion at a lower-than-
necessary frame rate, since most of what the physics engine computes is never
even shown! A more sophisticated solution is to measure the time of the physics
update, compare it to the overall frame time, and only make subsequent steps
if we can avoid the well of despair. This problem is not trivial, and there is no
ultimate solution that works for all scenarios, but it is well worth experimenting
with since it can have a very significant impact on the overall update frequency.
i i
i i
i i
i i
2.4 The Curse of Rotations

Since rotation is the mother of most linearization problems, it deserves some spe-
cial attention. One fun experiment we can try is to make the inertia tensor for
all objects infinite and see how that affects our simulation. The inertia tensor can
roughly be described as an object’s willingness to rotate and is often specified
as its inverse, so setting all values to zero typically means rotations will be com-
pletely disabled. You will be surprised how stable those stacks become and how
nicely most scenarios just work. Unfortunately, asking the producer if it is okay
to skip rotations will most likely not be a good idea, but what we can learn is that
the more inertia we add, the less rotation will occur, problems with linearization
will decrease, and the simulation will get more stable.
The problem is especially relevant on long, thin rods. So if you experience
instability with such objects, try increasing the inertia, especially on the axis along
the rod (compute inertia as if the rod was thicker). Increasing inertia will make
objects look heavy and add a perceived slow-motion effect, so you might want to
take it easy, but it can be a lifesaver and is surprisingly hard to spot.
2.5 Solver
Just to freshen up our memory without getting into technical detail, the solver
is responsible for computing the next valid state of a dynamic system, taking
into account various constraints. Now, since games need to be responsive, this
computation has to be fast, and the most popular way of doing that is using an
iterative method called sequential impulse. The concept is really simple: given a
handful of constraints, satisfy each one of them, one at a time, and when the last
one is done, start over again from the beginning and do another round until it is
“good enough,” where good enough often means, “Sorry man, we are out of time,
let’s just leave it here.”
What is really interesting, from a debugging perspective, is how this early
termination of a sequential impulse solver can affect the energy of the system.
Stopping before we are done will not add energy to the system, it will drain en-
ergy. This means it is really hard to blame the solver itself for energy being added
to the system.
When you implement a sequential impulse solver with early termination,
stacked, resting objects tend to sink into each other. Let’s investigate why this is
happening: at each frame, gravity causes an acceleration that increases an object’s
downward velocity. Contact generation creates a set of points and at each contact,
the solver tries to maintain a zero-relative velocity. However, since greedy game
i i
i i
i i
i i
2.5. Solver 33
programmers want CPU cycles for other things, the solver is terminated before it
is completely done, leaving the objects with a slight downward velocity instead of
zero, which is desired for resting contact. This slight downward velocity causes
objects to sink in, and the process is repeated.
To compensate for this behavior, most physics engines use a geometric mea-
sure for each contact point: either penetration depth or separation distance. As
the penetration depth increases, the desired resulting velocity is biased, so that it
is not zero but is actually negative, causing the objects to separate. This translates
to objects being soft instead of rigid, where the softness is defined by how well
the solver managed to solve the problem. This is why most solvers act springy
or squishy when using fewer iterations. Hence, the best way to get rid of the
mushroom is to increase the number of iterations in the solver!
2.5.1 Keeping the Configuration Unchanged

A solver that uses this kind of geometric compensation running at the same step
size and same number of iterations every frame will eventually find an equilib-
rium after a certain number of frames. Understanding that this equilibrium is not
a relaxed state but a very complex ongoing struggle between gravity, penetrat-
ing contacts, and penalty forces is key to stability. Removing or adding even a
single constraint, or changing the number of iterations, will cause the solver to
redistribute the weight and find a new equilibrium, which is a process that usu-
ally takes several frames and causes objects to wiggle. The set of constraints for
a specific scenario is sometimes called its configuration; hence keeping the con-
figuration unchanged from one frame to the next is very important, and we will
revisit this goal throughout the chapter.
2.5.2 Warm Starting

Assuming that the configuration does not change and objects are at rest, the im-
pulses at each contact point will be essentially the same every frame. It seems
kind of unnecessary to recompute the same problem over and over again. This is
where warm starting comes into the picture. Instead of recomputing the impulses
from scratch every time, we can start off with the impulses from the previous
frame and use our solver iterations to refine them instead. Using warm starting
is almost always a good idea. The downside is that we have to remember the
impulses from the last frame, which requires some extra bookkeeping. However,
since most physics engines keep track of pairs anyway, this can usually be added
relatively easily.
I mentioned before that a sequential impulse solver does not add energy but
rather drains energy from a system. This unfortunately no longer holds true if
i i
i i
i i
i i
1 2 3 4
Figure 2.3. A sequential impulse solver can cause an aligned box falling flat to the ground
to bounce off with rotation.
warm starting is being used. Full warm starting can give a springy, oscillating
behavior and prevents stacks from ever falling asleep. Because of this, the cur-
rent frame’s impulses are usually initialized with only a fraction of the previous
frame’s impulses. As we increase this fraction, the solver becomes more springy,
but it can also handle stacking better. It could be worth experimenting with this
to find the sweet spot.
2.5.3 Who Is Tilting My Box

A sequential impulse solver, as described above, is called in mathematical terms
Gauss-Seidel iteration. Another method is Jacobi iteration, in which all contact
points are solved independently, and then the resulting impulses are applied all at
once, hence removing the sequential in sequential impulse. Jacobi solvers have
some nice properties, especially when it comes to parallelization, but they gener-
ally take way more iterations to converge. One effect of sequential contact solving
is that symmetric problems often have seemingly unpredictable solutions. Con-
sider a perfectly aligned box dropped on a horizontal plane. All four corners hit
the plane at the same time, even forming four identical contact points. A sequen-
tial impulse solver will start solving one contact point without considering the
other three, apply the resulting impulse and then consider the next one. While
solving the second contact, the problem is no longer symmetric, since the box is
rotating after applying the first impulse. The resulting motion will behave as if
one corner of the box hit the ground slightly before the others (see Figure 2.3).
Hence, whenever we see this type of behavior, it is most likely not an error, just
brother Gauss-Seidel pulling a prank.
2.5.4 Friction
Friction is usually a little trickier than nonpenetration constraints since the max-
imum applied force depends on the normal force. The more pressure there is on
an object, the better it sticks. This interdependence results in a nonlinear problem
that is very tricky to solve accurately.
i i
i i
i i
i i
2.5. Solver 35
Coupled or decoupled. There are two main approaches to solving friction—

coupled and decoupled. In the coupled approach, the maximum friction force
changes while iterating, basically trying to solve a nonlinear problem with a tool-
box that is designed for linear problems (Gauss-Seidel), which may sound inap-
propriate but actually works fairly well in practice. The decoupled involves using
a fixed maximum friction force that is determined before iterating. In the case of
decoupled friction, there are two popular methods: either using the normal force
from the last time step, which requires some bookkeeping, or using a fixed value,
regardless of normal force. Such a fixed value is often based on the normal force
to keep the body at rest when affected by gravity. This may sound like a very
crude approximation, but it works surprisingly well, requires no bookkeeping,
and is perfectly linear. The main drawback is, of course, that friction is unaf-
fected by how much pressure is on the object. An object at the bottom of a stack
slides out just as easily as the ones on top!
Friction in stacks. It is worth mentioning the importance of proper friction for

handling stable stacking. Even in a scenario that seems largely unaffected by
friction, like a pyramid of boxes, friction plays a very important role. Remember
that the solver causes objects to rotate as an artifact of Gauss-Seidel iteration.
This rotation introduces a tangential motion that causes a stack to tip over if no
friction is used.
Friction drift. Remember the description above, about early solver termination
causing stacked objects to sink into each other? The exact same thing happens to
friction constraints, so if not compensated for, stacked objects might slide around
slowly on top of each other, eventually causing the whole thing to fall over. Track-
ing friction drift is cumbersome because it involves tracking pairs of objects over
several frames. For penetration depth it is rather straightforward since the desired
configuration is determined by the shape of the objects. For static friction, it is
not quite that easy. Static friction can be seen as a temporary joint holding two
objects together in the contact plane. If the maximum joint force is exceeded,
the objects should actually slide, but as long as the force is within the maximum
friction force, the relative net motion should ideally be zero. Hence, any motion
that actually occurs is due to early solver termination, linearization, or any other
of our artifact friends. Measuring this drift and compensating for it over time can
therefore help maintain stable stacking and natural friction behavior.
2.5.5 Shock Propagation

As a way to counteract the squishiness of iterative solvers, a shock-propagation
scheme can be used. The idea is to analyze the configuration and set up the
i i
i i
i i
i i
problem in such a way so that the solver can find a solution more quickly. Some
engines maintain an explicit graph of how the objects connect, whereas other en-
gines temporarily tweak mass ratios, inertia, or gravity. There is a lot of creativity
in shock propagation, but the artifacts are usually similar.
Large stacks require many iterations because the impulses at the bottom of the
stack are many times bigger than they would be for any pair of objects solved in
isolation. It takes many iterations to build up these large impulses. With shock
propagation, objects at the bottom of a stack will not feel the entire weight of the
objects on top. This can show up as the unnatural behavior of stacks tipping over
and can also be very obvious when observing friction—an object at the bottom of
a stack can be as easily pulled out as one on top.
2.6 Collision Detection

The collision-detection problem is often broken down into two or three phases.
First a broad phase, detecting objects in close proximity, and then sometimes a
mid phase, breaking down structures into smaller parts, before the near phase,
computing the actual contact points.
2.6.1 Phases
Broad phase. Let us start with the broad phase, which has a relatively well-
defined task: report overlaps of bounding volumes, most often axis-aligned bound-
ing boxes. If the bounding box is too small, we might experience weird shootouts
as the broad phase reports nonoverlap until the objects are already in penetration.
Having the bounding boxes too big, on the other hand, has a performance impli-
cation, so we have to be sure to make them just right. Remember that if we use
continuous collision detection or intentional separation distance, these must be
included in the bounding-box computation, so that the bounding box is no longer
tight-fitting around the object. These errors can be hard to spot since it looks right
most of the time.
Mid phase. The mid phase often consists of a bounding-volume hierarchy to

find convex objects in close proximity. Again, incorrect bounding-box compu-
tation can lead to shootouts. Another common problem is that objects can get
stuck in between two convex parts of a compound geometry. Consider the object
consisting of two spheres in Figure 2.4. Convex geometries are usually treated in
isolation, causing two conflicting contact points with opposite normals and pene-
tration depths. Feeding this problem to the solver is a dead end—there is no valid
solution! The objects will start shaking violently and act very unstable. There
i i
i i
i i
i i
2.6. Collision Detection 37
1 2
Figure 2.4. Compound geometries can cause artifacts when objects get stuck in between
parts.
is no good solution to this, but avoid using many small objects to make up com-
pound bodies. In the case above, a capsule or cylinder would have avoided the
problem.
1 2 3
Figure 2.5. An object sliding over a compound geometry can catch on invisible hooks
due to penetration.
Sliding. A similar problem can occur when an object is sliding over a flat surface
that is made up of multiple parts. Imagine the scene in Figure 2.5. The box
should ideally slide over the seam without any glitches, but they way the object
is constructed, the seam can create invisible “hooks” causing the sliding object to
1 2
Figure 2.6. Making a ramp on each side and letting them overlap is a simple work-around
to avoid objects getting stuck in compound objects.
i i
i i
i i
i i
stop. This is a typical frustrating artifact in certain car racing games where the
car can get trapped on invisible hooks while sliding along the fence. A simple
workaround is to construct the geometry as suggested in Figure 2.6.
Near phase. The near phase is by far the most complex part, where the actual
contact generation occurs. The poor solver is often blamed for unstable and jit-
tering simulations, but surprisingly often, shaking objects, general instability, and
jerkiness can be attributed to inadequate contact generation. A sequential-impulse
solver can be blamed for squishy stacks, improper friction, and many other things,
but it is actually quite hard to make a solver that causes objects to rattle and shake.
Near-phase contact generation often has many special cases and can be prone to
numerical floating-point precision issues. Some engines use contact pruning to
remove excess contact points. Special care should then be taken to make sure the
same contacts are pruned every frame. Remember that keeping the configuration
unchanged is key to stability.
2.6.2 Continuous Collision Detection

Ah, continuous collision detection, a technique that prevents objects from slipping
through walls—how about that! Just enable it, sit back, and enjoy how everything
magically works? Not quite, unfortunately.
Let us start by splitting the problem domain into two categories. First, there
are artifacts caused by discretization, typically, a small object passing through a
wall, called the bullet-through-paper problem already mentioned in the beginning
of this chapter. The other category is when contact is detected and generated,
but the solver fails to find a proper solution, usually because of early termination.
This artifact can be very significant when a light object is getting squished in
between two heavy objects and is sometimes referred to as the sandwich case (see
Figure 2.7).
1 2
Figure 2.7. Fast-moving objects are not the only ones taking shortcuts through walls.
Early solver termination can cause objects to get squished even if contacts are detected and
generated.
i i
i i
i i
i i
2.6. Collision Detection 39
1 2 3 4
Figure 2.8. Fast-moving objects could potentially get rotated through the floor even if a
contact is generated.
There is also a fairly common case that is a combination of the two. Imagine
a thin rod, slightly inclined, falling onto a flat surface, as illustrated in Figure 2.8.
The initial impact on the left side can cause a rotation so severe that by the next
time step, more than half of the rod has already passed through the floor, and con-
tact generation pushes it out on the other side. This example is a good illustration
of the sometimes complex interaction between linearization and discretization
that can bring a seemingly simple case like this to epic failure, even with con-
tinuous collision detection switched on. Note that some physics engine actually
do have a really sophisticated nonlinear continuous collision detection that does
consider rotation as well, in which case the example mentioned above would have
actually worked.
Sandwich case. The sandwich case can be somewhat worked around by priori-
tizing contacts. It is always the last constraints in a sequential impulse solver that
will be the most powerful and less prone to be violated upon early termination.
Therefore, it is best to rearrange the stream of contacts so that the ones that touch
important game-play mechanisms, such as walls, are solved at the very last. A
good common practice to avoid having objects get pushed through walls or the
floor is to solve all contacts involving a static object after any other contact or do
an extra iteration or two after termination to satisfy only static contacts.
Bullet-through-paper. An engine that aims to solve only the bullet-through-

paper case typically uses a raycast or linear sweep operation to find a time of
impact, then either splits up the time step—simulates the first half until the ob-
ject is touching and then does the rest—or employs an early-engage method that
inserts a contact point before the object has actually reached the surface. The
early-engage method can sometimes be noticed as an invisible wall in front of
obstacles, especially when using zero restitution, in which case a falling object
could come to a full stop some distance above the floor before finally falling the
last bit.
i i
i i
i i
i i
2.7 Joints
Joints are at the most fundamental level simpler than contacts. It is an equality
constraint, keeping the relative velocity between two bodies to zero. No inequal-
ities, no interdependent friction, etc. However, the way we combine constraints,
and add limits, breakable constraints, joint friction, and damping typically make
them fairly complex.
2.7.1 Drift
The most common artifact with joints is drifting, i.e., an unintended separation
between the two jointed objects. It is the joint counterpart to stacked objects
sinking into each other. The solver simply fails to find a valid solution within the
limited number of iterations. However, as described in the introduction to this
chapter, even with an unlimited number of iterations, joints can still drift due to
the linearization of velocities. Most engines cope with drifting in the same way
they cope with penetration or friction drift: simply add a geometric term, acting
as a spring to compensate for the drift.
2.7.2 Solving Direct

A good way to reduce joint drift is to solve as many constraints as possible at
the same time. Since joints are made up of equality constraints, they can be
solved as a system of linear equations, sometimes referred to as a direct solver.
Solving a system of linear equations is more complicated than applying sequential
impulses, but it does pay off in stability. On the upside, these two methods can
be easily combined. Some engines solve systems of three orthogonal constraints
(this particular assembly is found in many joint types) as a special case with a
three-by-three matrix inversion and then interweave the rest of the constraints
using sequential impulses.
The way the constraints are placed also matters when it comes to stability.
Consider a ball joint. It might be tempting to use a single constraint in the direc-
tion of maximum separation or in the direction of relative velocity. But remember
that whatever constraints go into the solver are the only constraints avoiding mo-
tion, so a single constraint will naturally transfer motion from the constraint axis
to the other two. A proper ball joint needs three constraints to be stable, and even
the way the three constraints are aligned matters. Keeping the constraints aligned
roughly the same way every frame helps stability. World axes are a good start-
ing point, but using the axes of one of the objects can be even better, since they
will then be stationary to at least one of the objects, keeping the configuration as
similar as possible.
i i
i i
i i
i i
2.7. Joints 41
1 2 3
Figure 2.9. Hard joint limits might start oscillating due to discretization.
2.7.3 Joint Limits

Some joints support limits that block either linear or angular motion. This is very
similar to a contact constraint. A common artifact with jointed structures with
limits is that they tend to shake and never come to rest. Even if a joint limit is
supposed to be a hard limit, it is usually a good idea to soften it up a tiny bit.
A hard limit that fully engages when limit is exceeded and fully disengaged oth-
erwise is very hard to get stable. Consider the limited hinge joint in Figure 2.9.
Before it hits the limit, the joint can move freely. Now, since the simulation is
carried out in discrete steps, this means that the joint limit will not kick in until
the limit is already exceeded. Once it is exceeded, the geometric term that is sup-
posed to correct the joint will kick the joint back, causing the limit to disengage
and fall back down again. This is a good example of how rapidly changing the
configuration causes instability.
Using soft limits, so that the hinge is allowed to rest on a spring for a certain
distance, will give the solver a chance to find equilibrium without changing the
configuration every frame.
2.7.4 Dealing with the Dead Guy

Ragdolls might qualify as the number one physics frustration worldwide, and
numerous games are still shipped with ragdolls doing the monkey dance while
“dead.” In my experience, ragdoll instability is due to two main factors—hard
joint limits and excess inter-bone collisions. Applying soft limits as described
above will get you halfway there. A ragdoll is a pretty complex structure, espe-
cially since it can end up on the ground in any pose, including one that engages
multiple joint limits.
Shaking usually appears either when the configuration changes or when there
are conflicting constraints. The more constraints there are to solve, the higher
chance there is for conflicting ones. Therefore, it is usually a good idea to disable
as many collisions as possible. Start with a ragdoll with all bone–bone collisions
i i
i i
i i
i i
turned off. You will be surprised how good it still looks. You might want to
enable certain collisions, such as hips–lower arms, and calf-calf collisions, but in
general, it is fine to leave most of the other ones, assuming you have a decent
setup of joint limits.
Finally, add a certain amount of damping or friction to all joints. The flesh in
the human body naturally dampens any motion, so some amount of friction will
look more natural, at the same time helping our ragdoll get some sleep.
2.7.5 Geometric Joint Recovery

Since joint drifting cannot be completely avoided, it is tempting to do a final
geometric translation to pull joints back together. This can work well in some
situations, but for the most part, it will add instability and energy to the overall
system. Consider the scene illustrated in Figure 2.10. Translating the joint back
into position introduces a penetration that will at the next frame push the body up
and add energy to the system, possibly causing a new joint displacement. If we
really want to get our hands dirty and implement geometric recovery, we should
consider the whole system, also doing it for collisions to resolve penetrations, and
modify both position and rotation.
A better way to do this correction is to do joint translation as a pure visual
effect. In the ragdoll case, many games use only rotation from the physics rep-
resentation, while keeping a fixed displacement, efficiently hiding joint drifting.
However, if the joint displacement is large, it can cause visual penetration, espe-
cially at the outermost limbs of the ragdoll.
1 2
Figure 2.10. Compensating for joint drift by moving the objects is usually a really bad
idea.
2.8 Direct Animation

Sometimes we might want to simply animate physical objects, having them affect
other objects but not be affected themselves. There are several ways to do this,
i i
i i
i i
i i
2.9. Artifact Reference 43
including using joint motors, to physically drive the object. However, sometimes
we simply want to move an object along an animated path, totally unaffected by
collisions. Animating an object by simply setting its position is never a good
idea. It might still affect objects in its environment, but collisions will be soft and
squishy. This is partly because the velocity of the object is not updated correctly,
so for all the solver knows, there is a collision with penetration, but it is not aware
that any of the objects are moving. To avoid this, make sure to update the velocity
to match the actual motion. Some engines have convenience functions for this.
Even when the velocity is correct, if the animated object is not considerably
heavier than the objects it is colliding with, the collisions will be soft and squishy.
For an animated object to fully affect the environment, its mass and inertia tensor
should be infinite. Only then will other objects fully obey and move out of the
way. Hence, if we animate objects by setting their position, make sure to give
them correct velocity, both linear and angular, and make the mass and inertiaten-
sor temporarily infinite.
2.9 Artifact Reference

Following is a list of artifacts and their causes.
• Frame rate gradually slows down to grinding halt. You might have hit
the well of despair, where the physics engine tries to compensate for its
own slow down. Put a cap on the number of physics steps per frame or
implement a more sophisticated time-stepping algorithm.
• Simulation runs in slow motion. Check that the physics step size corre-
sponds to actual time. Keep an eye on simulation scale. A larger scale will
result in slow-motion effects.
• Stacked objects are shaking or rattling. Check the contact-generation code

and make sure the configuration is not rapidly changing.
• An aligned object dropped on a flat surface bounces off in a weird way.

This is natural behavior of Gauss-Seidel iteration.
• Objects at the bottom of a stack do not feel the weight of the ones on top.
This is caused by a shock-propagation scheme or decoupled friction with
fixed maximum force.
• Highly asymmetric objects act unstable. The low inertia around one of the
axes causes a lot of rotation. Increase inertia tensors, as if the objects were
more symmetric.
i i
i i
i i
i i
• Stacked objects act springy and objects get squashed. The solver iteration
count might be too low. We can also try adding warm starting or a shock-
propagation scheme.
• Stacks are oscillating and tend to never come to rest. Too much warm
starting is being used.
• Stacked objects slide around on each other, eventually falling over. There
is a lack of friction-drift compensation.
• An object penetrate freely and then suddenly shoots out. This can be an
incorrect bounding box or a contact-generation problem.
• Objects are getting pushed through walls by other objects. The contact
stream might not favor static contacts. Rearrange the contact stream so that
static contacts are at the end of the stream.
• Small, fast objects pass through walls. Enable continuous collision detec-
tion or early engage. If the problem still does not go away, it can be due to
rotation. Make the object thicker or increase inertia tensor.
• Falling object stop before hitting the floor and then fall down the last bit.
This is cased by early-engage contact generation. You can add some resti-
tution to hide the problem or implement more sophisticated continuous col-
lision detection.
• Jointed structures drift apart, causing visual separation. This cannot en-
tirely be avoided due to the nature of iterative solvers and linearization.
Use a direct solver to minimize the problem. You can also try a visual joint
displacement, if applicable.
• Ragdolls are shaking and never come to rest. There can be conflicting joint
limits, too many inter-bone collisions, or joint limits that are too hard.
• An animated object does not affect the environment properly. The animated
object might have incorrect velocity, or the mass or inertia is not infinite.
i i
i i
References
1 - I - Game Physics 101
[Van Verth and Bishop 08] James M. Van Verth and Lars M.
Bishop. Essential Mathematics for Games and Interactive
Applications, Second edition. San Francisco: Morgan
Kaufmann, 2008.
[Witkin and Baraff 01] Andrew Witkin and David Baraff.

“Physically Based Modelling: Principles and Practice.” ACM
SIGGRAPH 2001 Course Notes. Available at
http://www.pixar.com/companyinfo/research/pbm2001/, 2001. 2
Understanding Game Physics Artifacts Dennis Gustafsson 2.1
Introduction Physics engines are known for being
notoriously hard to debug. For most people, physics
artifacts are just a seemingly random stream of weird
behavior that makes no sense. Few components of a game
engine cause much frustration and hair loss. We have all
seen ragdolls doing the funky monkey dance and stacks of
“rigid” bodies acting more like a tower of greasy
mushrooms, eventually falling over or taking off into the
stratosphere. This chapter will help you understand the
underlying causes of this behavior and common mistakes that
lead to it. Some of them can be fixed, some of them can be
worked around, and some of them we will just have to live
with for now. This is mostly written for people writing a
physics engine of their own, but understanding the
underlying mechanisms is helpful even if you are using an
off-the-shelf product. 2.2 Discretization and Linearization
Physics engines advance time in discrete steps, typically
about 17 ms for a 60 Hz update frequency. It is not
uncommon to split up the time step into smaller steps, say
two or three updates per frame (often called substepping)
or even more, but no matter how small of a time step you
use, it will still be a discretization of a continuous
problem. Real-world physics do not move in steps, not even
small steps, but in a continuous motion. This is by far the
number one source for physics artifacts, and any properly
implemented physics engine should behave better with more
substeps. If a physics artifact does not go away with more
substeps, there is most likely something wrong with your
code. The bullet-through-paper problem illustrated in
Figure 2.1 is a typical example of a problem that is caused
by discretization. 29
1 32
Figure 2.1. Discretization can cause fast-moving objects to

travel through walls.
Another big source of artifacts is the linearization that
most physics engines
employ—the assumption that during the time step everything

travels in a linear
motion. For particle physics, this is a pretty good

approximation, but as soon as
you introduce rigid bodies and rotation, it falls flat to

the ground. Consider the ball
joint illustrated in Figure 2.2. The two bodies are

rotating in opposite directions.
At this particular point in time, the two bodies are lined

up as shown. Even if
the solver manages to entirely solve the relative velocity

at the joint-attachment
point to zero, as soon as time is advanced, no matter how

small the amount, the
two attachment points will drift apart. This is the

fundamental of linearization,
which makes it impossible to create an accurate physics

engine by solving just for
relative linear velocities at discrete points in time.
Even though linearization and discretization are two

different approximations,
they are somewhat interconnected. Lowering the step size

(increasing the number
of substeps) will always make linearization less

problematic, since any nonlinear
motion will appear more and more linear the shorter the
time span. The ambitious
reader can make a parallel here to the Heisenberg principle

of uncertainty!
The major takeaway here is that as long as a physics engine

employs dis
cretization and linearization, which all modern physics
engines and all algorithms 1 2
Figure 2.2. Even if relative linear velocity at the joint

attachment is zero, objects can
separate during integration due to rotation. and examples

in this book do, there will always be artifacts. These
artifacts are not results of a problem with the physics
engine itself, but the assumptions and approximations the
engine is built upon. This is important to realize, because
once you accept the artifacts and understand their
underlying causes, it makes them easier to deal with and
work around. 2.3 Time Stepping and the Well of Despair
Since the physics engine is advanced in discrete steps,
what happens if the game drops a frame? This is a common
source of confusion when integrating a physics engine,
since you probably want the motion in your game to be
independent of frame rate. On a slow machine, or in the
occasion of your modern operating system going off to index
the quicksearch database in the middle of a mission, the
graphical update might not keep up with the desired
frequency. There are several different strategies for how
to handle such a scenario from a physics perspective. You
can ignore the fact that a frame was dropped and keep
stepping the normal step length, which will create a
slow-motion effect that is usually highly undesirable.
Another option is to take a larger time step, which will
create a more realistic path of motion but may introduce
jerkiness due to the variation in discretization. The third
option is to take several, equally sized physics steps.
This option is more desirable, as it avoids the slowdown
while still doing fixed-size time steps. 2.3.1 The Well of
Despair Making several physics updates per frame usually
works fine, unless the physics is what is causing the
slowdown to begin with. If physics is the bottleneck, the
update frequency will go into the well of despair, meaning
every subsequent frame needs more physics updates, causing
a slower update frequency, resulting in even more physics
updates the next frame, and so on. There is unfortunately
no way to solve this problem other than to optimize the
physics engine or simplify the problem, so what most people
do is put a cap on the number of physics updates per frame,
above which the simulation will simply run in slow motion.
Actually, it will not only run in slow motion but it will
run in slow motion at a lower-thannecessary frame rate,
since most of what the physics engine computes is never
even shown! A more sophisticated solution is to measure the
time of the physics update, compare it to the overall frame
time, and only make subsequent steps if we can avoid the
well of despair. This problem is not trivial, and there is
no ultimate solution that works for all scenarios, but it
is well worth experimenting with since it can have a very
significant impact on the overall update frequency.
2.4 The Curse of Rotations
Since rotation is the mother of most linearization

problems, it deserves some spe
cial attention. One fun experiment we can try is to make

the inertia tensor for
all objects infinite and see how that affects our

simulation. The inertia tensor can
roughly be described as an object’s willingness to rotate

and is often specified
as its inverse, so setting all values to zero typically

means rotations will be com
pletely disabled. You will be surprised how stable those

stacks become and how
nicely most scenarios just work. Unfortunately, asking the

producer if it is okay
to skip rotations will most likely not be a good idea, but

what we can learn is that
the more inertia we add, the less rotation will occur,

problems with linearization
will decrease, and the simulation will get more stable.
The problem is especially relevant on long, thin rods. So

if you experience
instability with such objects, try increasing the inertia,

especially on the axis along
the rod (compute inertia as if the rod was thicker).

Increasing inertia will make
objects look heavy and add a perceived slow-motion effect,

so you might want to
take it easy, but it can be a lifesaver and is surprisingly

hard to spot.
2.5 Solver
Just to freshen up our memory without getting into

technical detail, the solver
is responsible for computing the next valid state of a

dynamic system, taking
into account various constraints. Now, since games need to

be responsive, this
computation has to be fast, and the most popular way of

doing that is using an
iterative method called sequential impulse. The concept is

really simple: given a
handful of constraints, satisfy each one of them, one at a

time, and when the last
one is done, start over again from the beginning and do

another round until it is
“good enough,” where good enough often means, “Sorry man,

we are out of time,
let’s just leave it here.”
What is really interesting, from a debugging perspective,

is how this early
termination of a sequential impulse solver can affect the

energy of the system.
Stopping before we are done will not add energy to the

system, it will drain en
ergy. This means it is really hard to blame the solver

itself for energy being added
to the system.
When you implement a sequential impulse solver with early

termination,
stacked, resting objects tend to sink into each other.

Let’s investigate why this is
happening: at each frame, gravity causes an acceleration
that increases an object’s
downward velocity. Contact generation creates a set of

points and at each contact,
the solver tries to maintain a zero-relative velocity.

However, since greedy game programmers want CPU cycles for
other things, the solver is terminated before it is
completely done, leaving the objects with a slight downward
velocity instead of zero, which is desired for resting
contact. This slight downward velocity causes objects to
sink in, and the process is repeated. To compensate for
this behavior, most physics engines use a geometric measure
for each contact point: either penetration depth or
separation distance. As the penetration depth increases,
the desired resulting velocity is biased, so that it is not
zero but is actually negative, causing the objects to
separate. This translates to objects being soft instead of
rigid, where the softness is defined by how well the solver
managed to solve the problem. This is why most solvers act
springy or squishy when using fewer iterations. Hence, the
best way to get rid of the mushroom is to increase the
number of iterations in the solver! 2.5.1 Keeping the
Configuration Unchanged A solver that uses this kind of
geometric compensation running at the same step size and
same number of iterations every frame will eventually find
an equilibrium after a certain number of frames.
Understanding that this equilibrium is not a relaxed state
but a very complex ongoing struggle between gravity,
penetrating contacts, and penalty forces is key to
stability. Removing or adding even a single constraint, or
changing the number of iterations, will cause the solver to
redistribute the weight and find a new equilibrium, which
is a process that usually takes several frames and causes
objects to wiggle. The set of constraints for a specific
scenario is sometimes called its configuration; hence
keeping the configuration unchanged from one frame to the
next is very important, and we will revisit this goal
throughout the chapter. 2.5.2 Warm Starting Assuming that
the configuration does not change and objects are at rest,
the impulses at each contact point will be essentially the
same every frame. It seems kind of unnecessary to recompute
the same problem over and over again. This is where warm
starting comes into the picture. Instead of recomputing the
impulses from scratch every time, we can start off with the
impulses from the previous frame and use our solver
iterations to refine them instead. Using warm starting is
almost always a good idea. The downside is that we have to
remember the impulses from the last frame, which requires
some extra bookkeeping. However, since most physics engines
keep track of pairs anyway, this can usually be added
relatively easily. I mentioned before that a sequential
impulse solver does not add energy but rather drains energy
from a system. This unfortunately no longer holds true if
1 32 4
Figure 2.3. A sequential impulse solver can cause an

aligned box falling flat to the ground
to bounce off with rotation.
warm starting is being used. Full warm starting can give a

springy, oscillating
behavior and prevents stacks from ever falling asleep.

Because of this, the cur
rent frame’s impulses are usually initialized with only a

fraction of the previous
frame’s impulses. As we increase this fraction, the solver

becomes more springy,
but it can also handle stacking better. It could be worth

experimenting with this
to find the sweet spot.
2.5.3 Who Is Tilting My Box
A sequential impulse solver, as described above, is called

in mathematical terms
Gauss-Seidel iteration. Another method is Jacobi iteration,

in which all contact
points are solved independently, and then the resulting

impulses are applied all at
once, hence removing the sequential in sequential impulse.

Jacobi solvers have
some nice properties, especially when it comes to

parallelization, but they gener
ally take way more iterations to converge. One effect of

sequential contact solving
is that symmetric problems often have seemingly
unpredictable solutions. Con
sider a perfectly aligned box dropped on a horizontal

plane. All four corners hit
the plane at the same time, even forming four identical

contact points. A sequen
tial impulse solver will start solving one contact point

without considering the
other three, apply the resulting impulse and then consider

the next one. While
solving the second contact, the problem is no longer

symmetric, since the box is
rotating after applying the first impulse. The resulting

motion will behave as if
one corner of the box hit the ground slightly before the
others (see Figure 2.3).
Hence, whenever we see this type of behavior, it is most

likely not an error, just
brother Gauss-Seidel pulling a prank.
2.5.4 Friction
Friction is usually a little trickier than nonpenetration

constraints since the max
imum applied force depends on the normal force. The more

pressure there is on
an object, the better it sticks. This interdependence

results in a nonlinear problem
that is very tricky to solve accurately. Coupled or

decoupled. There are two main approaches to solving
friction— coupled and decoupled. In the coupled approach,
the maximum friction force changes while iterating,
basically trying to solve a nonlinear problem with a
toolbox that is designed for linear problems
(Gauss-Seidel), which may sound inappropriate but actually
works fairly well in practice. The decoupled involves using
a fixed maximum friction force that is determined before
iterating. In the case of decoupled friction, there are two
popular methods: either using the normal force from the
last time step, which requires some bookkeeping, or using a
fixed value, regardless of normal force. Such a fixed value
is often based on the normal force to keep the body at rest
when affected by gravity. This may sound like a very crude
approximation, but it works surprisingly well, requires no
bookkeeping, and is perfectly linear. The main drawback is,
of course, that friction is unaffected by how much pressure
is on the object. An object at the bottom of a stack slides
out just as easily as the ones on top! Friction in stacks.
It is worth mentioning the importance of proper friction
for handling stable stacking. Even in a scenario that seems
largely unaffected by friction, like a pyramid of boxes,
friction plays a very important role. Remember that the
solver causes objects to rotate as an artifact of
Gauss-Seidel iteration. This rotation introduces a
tangential motion that causes a stack to tip over if no
friction is used. Friction drift. Remember the description
above, about early solver termination causing stacked
objects to sink into each other? The exact same thing
happens to friction constraints, so if not compensated for,
stacked objects might slide around slowly on top of each
other, eventually causing the whole thing to fall over.
Tracking friction drift is cumbersome because it involves
tracking pairs of objects over several frames. For
penetration depth it is rather straightforward since the
desired configuration is determined by the shape of the
objects. For static friction, it is not quite that easy.
Static friction can be seen as a temporary joint holding
two objects together in the contact plane. If the maximum
joint force is exceeded, the objects should actually slide,
but as long as the force is within the maximum friction
force, the relative net motion should ideally be zero.
Hence, any motion that actually occurs is due to early
solver termination, linearization, or any other of our
artifact friends. Measuring this drift and compensating for
it over time can therefore help maintain stable stacking
and natural friction behavior. 2.5.5 Shock Propagation As a
way to counteract the squishiness of iterative solvers, a
shock-propagation scheme can be used. The idea is to
analyze the configuration and set up the
problem in such a way so that the solver can find a

solution more quickly. Some
engines maintain an explicit graph of how the objects

connect, whereas other en
gines temporarily tweak mass ratios, inertia, or gravity.

There is a lot of creativity
in shock propagation, but the artifacts are usually similar.
Large stacks require many iterations because the impulses

at the bottom of the
stack are many times bigger than they would be for any pair
of objects solved in
isolation. It takes many iterations to build up these large

impulses. With shock
propagation, objects at the bottom of a stack will not feel

the entire weight of the
objects on top. This can show up as the unnatural behavior

of stacks tipping over
and can also be very obvious when observing friction—an

object at the bottom of
a stack can be as easily pulled out as one on top.
2.6 Collision Detection
The collision-detection problem is often broken down into

two or three phases.
First a broad phase, detecting objects in close proximity,

and then sometimes a
mid phase, breaking down structures into smaller parts,

before the near phase,
computing the actual contact points.
2.6.1 Phases
Broad phase. Let us start with the broad phase, which has a
relatively well
defined task: report overlaps of bounding volumes, most

often axis-aligned bound
ing boxes. If the bounding box is too small, we might

experience weird shootouts
as the broad phase reports nonoverlap until the objects are

already in penetration.
Having the bounding boxes too big, on the other hand, has a
performance impli
cation, so we have to be sure to make them just right.

Remember that if we use
continuous collision detection or intentional separation

distance, these must be
included in the bounding-box computation, so that the

bounding box is no longer
tight-fitting around the object. These errors can be hard

to spot since it looks right
most of the time.
Mid phase. The mid phase often consists of a

bounding-volume hierarchy to
find convex objects in close proximity. Again, incorrect

bounding-box compu
tation can lead to shootouts. Another common problem is

that objects can get
stuck in between two convex parts of a compound geometry.

Consider the object
consisting of two spheres in Figure 2.4. Convex geometries

are usually treated in
isolation, causing two conflicting contact points with

opposite normals and pene
tration depths. Feeding this problem to the solver is a

dead end—there is no valid
solution! The objects will start shaking violently and act

very unstable. There 1 2 Figure 2.4. Compound geometries
can cause artifacts when objects get stuck in between
parts. is no good solution to this, but avoid using many
small objects to make up compound bodies. In the case
above, a capsule or cylinder would have avoided the
problem. 1 32 Figure 2.5. An object sliding over a compound
geometry can catch on invisible hooks due to penetration.
Sliding. A similar problem can occur when an object is
sliding over a flat surface that is made up of multiple
parts. Imagine the scene in Figure 2.5. The box should
ideally slide over the seam without any glitches, but they
way the object is constructed, the seam can create
invisible “hooks” causing the sliding object to 1 2 Figure
2.6. Making a ramp on each side and letting them overlap is
a simple work-around to avoid objects getting stuck in
compound objects.
stop. This is a typical frustrating artifact in certain car

racing games where the
car can get trapped on invisible hooks while sliding along

the fence. A simple
workaround is to construct the geometry as suggested in

Figure 2.6.
Near phase. The near phase is by far the most complex part,
where the actual
contact generation occurs. The poor solver is often blamed

for unstable and jit
tering simulations, but surprisingly often, shaking

objects, general instability, and
jerkiness can be attributed to inadequate contact

generation. A sequential-impulse
solver can be blamed for squishy stacks, improper friction,

and many other things,
but it is actually quite hard to make a solver that causes

objects to rattle and shake.
Near-phase contact generation often has many special cases

and can be prone to
numerical floating-point precision issues. Some engines use

contact pruning to
remove excess contact points. Special care should then be

taken to make sure the
same contacts are pruned every frame. Remember that keeping

the configuration
unchanged is key to stability.
2.6.2 Continuous Collision Detection
Ah, continuous collision detection, a technique that

prevents objects from slipping
through walls—how about that! Just enable it, sit back, and
enjoy how everything
magically works? Not quite, unfortunately.
Let us start by splitting the problem domain into two

categories. First, there
are artifacts caused by discretization, typically, a small

object passing through a
wall, called the bullet-through-paper problem already

mentioned in the beginning
of this chapter. The other category is when contact is

detected and generated,
but the solver fails to find a proper solution, usually

because of early termination.
This artifact can be very significant when a light object

is getting squished in
between two heavy objects and is sometimes referred to as

the sandwich case (see
Figure 2.7). 1 2
Figure 2.7. Fast-moving objects are not the only ones

taking shortcuts through walls.
Early solver termination can cause objects to get squished

even if contacts are detected and
generated. 1 32 4 Figure 2.8. Fast-moving objects could

potentially get rotated through the floor even if a contact
is generated. There is also a fairly common case that is a
combination of the two. Imagine a thin rod, slightly
inclined, falling onto a flat surface, as illustrated in
Figure 2.8. The initial impact on the left side can cause a
rotation so severe that by the next time step, more than
half of the rod has already passed through the floor, and
contact generation pushes it out on the other side. This
example is a good illustration of the sometimes complex
interaction between linearization and discretization that
can bring a seemingly simple case like this to epic
failure, even with continuous collision detection switched
on. Note that some physics engine actually do have a really
sophisticated nonlinear continuous collision detection that
does consider rotation as well, in which case the example
mentioned above would have actually worked. Sandwich case.
The sandwich case can be somewhat worked around by
prioritizing contacts. It is always the last constraints in
a sequential impulse solver that will be the most powerful
and less prone to be violated upon early termination.
Therefore, it is best to rearrange the stream of contacts
so that the ones that touch important game-play mechanisms,
such as walls, are solved at the very last. A good common
practice to avoid having objects get pushed through walls
or the floor is to solve all contacts involving a static
object after any other contact or do an extra iteration or
two after termination to satisfy only static contacts.
Bullet-through-paper. An engine that aims to solve only the
bullet-throughpaper case typically uses a raycast or linear
sweep operation to find a time of impact, then either
splits up the time step—simulates the first half until the
object is touching and then does the rest—or employs an
early-engage method that inserts a contact point before the
object has actually reached the surface. The early-engage
method can sometimes be noticed as an invisible wall in
front of obstacles, especially when using zero restitution,
in which case a falling object could come to a full stop
some distance above the floor before finally falling the
last bit.
2.7 Joints
Joints are at the most fundamental level simpler than

contacts. It is an equality
constraint, keeping the relative velocity between two

bodies to zero. No inequal
ities, no interdependent friction, etc. However, the way we

combine constraints,
and add limits, breakable constraints, joint friction, and

damping typically make
them fairly complex.
2.7.1 Drift
The most common artifact with joints is drifting, i.e., an

unintended separation
between the two jointed objects. It is the joint

counterpart to stacked objects
sinking into each other. The solver simply fails to find a
valid solution within the
limited number of iterations. However, as described in the

introduction to this
chapter, even with an unlimited number of iterations,

joints can still drift due to
the linearization of velocities. Most engines cope with

drifting in the same way
they cope with penetration or friction drift: simply add a

geometric term, acting
as a spring to compensate for the drift.
2.7.2 Solving Direct
A good way to reduce joint drift is to solve as many

constraints as possible at
the same time. Since joints are made up of equality

constraints, they can be
solved as a system of linear equations, sometimes referred

to as a direct solver.
Solving a system of linear equations is more complicated

than applying sequential
impulses, but it does pay off in stability. On the upside,

these two methods can
be easily combined. Some engines solve systems of three

orthogonal constraints
(this particular assembly is found in many joint types) as

a special case with a
three-by-three matrix inversion and then interweave the

rest of the constraints
using sequential impulses.
The way the constraints are placed also matters when it

comes to stability.
Consider a ball joint. It might be tempting to use a single

constraint in the direc
tion of maximum separation or in the direction of relative

velocity. But remember
that whatever constraints go into the solver are the only

constraints avoiding mo
tion, so a single constraint will naturally transfer motion

from the constraint axis
to the other two. A proper ball joint needs three

constraints to be stable, and even
the way the three constraints are aligned matters. Keeping

the constraints aligned
roughly the same way every frame helps stability. World

axes are a good start
ing point, but using the axes of one of the objects can be
even better, since they
will then be stationary to at least one of the objects,

keeping the configuration as
similar as possible. 1 32 Figure 2.9. Hard joint limits

might start oscillating due to discretization. 2.7.3 Joint
Limits Some joints support limits that block either linear
or angular motion. This is very similar to a contact
constraint. A common artifact with jointed structures with
limits is that they tend to shake and never come to rest.
Even if a joint limit is supposed to be a hard limit, it is
usually a good idea to soften it up a tiny bit. A hard
limit that fully engages when limit is exceeded and fully
disengaged otherwise is very hard to get stable. Consider
the limited hinge joint in Figure 2.9. Before it hits the
limit, the joint can move freely. Now, since the simulation
is carried out in discrete steps, this means that the joint
limit will not kick in until the limit is already exceeded.
Once it is exceeded, the geometric term that is supposed to
correct the joint will kick the joint back, causing the
limit to disengage and fall back down again. This is a good
example of how rapidly changing the configuration causes
instability. Using soft limits, so that the hinge is
allowed to rest on a spring for a certain distance, will
give the solver a chance to find equilibrium without
changing the configuration every frame. 2.7.4 Dealing with
the Dead Guy Ragdolls might qualify as the number one
physics frustration worldwide, and numerous games are still
shipped with ragdolls doing the monkey dance while “dead.”
In my experience, ragdoll instability is due to two main
factors—hard joint limits and excess inter-bone collisions.
Applying soft limits as described above will get you
halfway there. A ragdoll is a pretty complex structure,
especially since it can end up on the ground in any pose,
including one that engages multiple joint limits. Shaking
usually appears either when the configuration changes or
when there are conflicting constraints. The more
constraints there are to solve, the higher chance there is
for conflicting ones. Therefore, it is usually a good idea
to disable as many collisions as possible. Start with a
ragdoll with all bone–bone collisions
turned off. You will be surprised how good it still looks.

You might want to
enable certain collisions, such as hips–lower arms, and

calf-calf collisions, but in
general, it is fine to leave most of the other ones,

assuming you have a decent
setup of joint limits.
Finally, add a certain amount of damping or friction to all

joints. The flesh in
the human body naturally dampens any motion, so some amount

of friction will
look more natural, at the same time helping our ragdoll get
some sleep.
2.7.5 Geometric Joint Recovery
Since joint drifting cannot be completely avoided, it is

tempting to do a final
geometric translation to pull joints back together. This

can work well in some
situations, but for the most part, it will add instability

and energy to the overall
system. Consider the scene illustrated in Figure 2.10.

Translating the joint back
into position introduces a penetration that will at the

next frame push the body up
and add energy to the system, possibly causing a new joint
displacement. If we
really want to get our hands dirty and implement geometric

recovery, we should
consider the whole system, also doing it for collisions to

resolve penetrations, and
modify both position and rotation.
A better way to do this correction is to do joint

translation as a pure visual
effect. In the ragdoll case, many games use only rotation

from the physics rep
resentation, while keeping a fixed displacement,

efficiently hiding joint drifting.
However, if the joint displacement is large, it can cause

visual penetration, espe
cially at the outermost limbs of the ragdoll. 1 2
Figure 2.10. Compensating for joint drift by moving the

objects is usually a really bad
idea.
2.8 Direct Animation
Sometimes we might want to simply animate physical objects,

having them affect
other objects but not be affected themselves. There are

several ways to do this, including using joint motors, to
physically drive the object. However, sometimes we simply
want to move an object along an animated path, totally
unaffected by collisions. Animating an object by simply
setting its position is never a good idea. It might still
affect objects in its environment, but collisions will be
soft and squishy. This is partly because the velocity of
the object is not updated correctly, so for all the solver
knows, there is a collision with penetration, but it is not
aware that any of the objects are moving. To avoid this,
make sure to update the velocity to match the actual
motion. Some engines have convenience functions for this.
Even when the velocity is correct, if the animated object
is not considerably heavier than the objects it is
colliding with, the collisions will be soft and squishy.
For an animated object to fully affect the environment, its
mass and inertia tensor should be infinite. Only then will
other objects fully obey and move out of the way. Hence, if
we animate objects by setting their position, make sure to
give them correct velocity, both linear and angular, and
make the mass and inertiatensor temporarily infinite. 2.9
Artifact Reference Following is a list of artifacts and
their causes. • Frame rate gradually slows down to grinding
halt. You might have hit the well of despair, where the
physics engine tries to compensate for its own slow down.
Put a cap on the number of physics steps per frame or
implement a more sophisticated time-stepping algorithm. •
Simulation runs in slow motion. Check that the physics step
size corresponds to actual time. Keep an eye on simulation
scale. A larger scale will result in slow-motion effects. •
Stacked objects are shaking or rattling. Check the
contact-generation code and make sure the configuration is
not rapidly changing. • An aligned object dropped on a flat
surface bounces off in a weird way. This is natural
behavior of Gauss-Seidel iteration. • Objects at the bottom
of a stack do not feel the weight of the ones on top. This
is caused by a shock-propagation scheme or decoupled
friction with fixed maximum force. • Highly asymmetric
objects act unstable. The low inertia around one of the
axes causes a lot of rotation. Increase inertia tensors, as
if the objects were more symmetric.
• Stacked objects act springy and objects get squashed. The

solver iteration count might be too low. We can also try
adding warm starting or a shockpropagation scheme.
• Stacks are oscillating and tend to never come to rest.

Too much warm starting is being used.
• Stacked objects slide around on each other, eventually

falling over. There is a lack of friction-drift
compensation.
• An object penetrate freely and then suddenly shoots out.

This can be an incorrect bounding box or a
contact-generation problem.
• Objects are getting pushed through walls by other

objects. The contact stream might not favor static
contacts. Rearrange the contact stream so that static
contacts are at the end of the stream.
• Small, fast objects pass through walls. Enable continuous

collision detection or early engage. If the problem still
does not go away, it can be due to rotation. Make the
object thicker or increase inertia tensor.
• Falling object stop before hitting the floor and then

fall down the last bit. This is cased by early-engage
contact generation. You can add some restitution to hide
the problem or implement more sophisticated continuous
collision detection.
• Jointed structures drift apart, causing visual

separation. This cannot entirely be avoided due to the
nature of iterative solvers and linearization. Use a direct
solver to minimize the problem. You can also try a visual
joint displacement, if applicable.
• Ragdolls are shaking and never come to rest. There can be

conflicting joint limits, too many inter-bone collisions,
or joint limits that are too hard.
• An animated object does not affect the environment

properly. The animated object might have incorrect
velocity, or the mass or inertia is not infinite.
3 - 3 - Broad Phase and Constraint
Optimization for PlayStation� 3
i i i i
3 - III - Particles
[Harada et al. 07] T. Harada, S. Koshizuka, and Y.

Kawaguchi. “Smoothed Particle Hydrodynamics on GPUs.” Paper
presented at Computer Graphics International Conference,
Petropolis, Brazil, May 30–June 2, 2007.
[Harlow and Welch 65] Francis H. Harlow and Eddie J. Welch.

“Numerical Calculation of Time-Dependent Viscous
Incompressible Flow of Fluid with Free Surface.” Physics of
Fluids 8:12 (1965), 2182–2189.
[Harris et al. 07] Mark Harris, Shubhabrata Sengupta, and

John D. Owens. “Parallel Prefix Sum (Scan) with CUDA.” In
GPU Gems 3, edited by Hubert Nguyen, pp. 851–876. Reading,
MA: Addison Wesley, 2007.
[Hjelte 06] N. Hjelte. “Smoothed Particle Hydrodynamics on

the Cell Broadband Engine.” Preprint, 2006. Available at
http://www.2ld.de/gdc2004/.
[Kanamori et al. 08] Yoshihiro Kanamori, Zoltan Szego, and

Tomoyuki Nishita. “GPU-Based Fast Ray Casting for a Large
Number of Metaballs.” Comput. Graph. Forum 27:2 (2008),
351–360.
[Koshizuka and Oka 96] S. Koshizuka and Y. Oka.

“Moving-Particle Semiimplicit Method for Fragmentation of
Incompressible Flow.” Nucl. Sci. Eng. 123 (1996), 421–434.
[Lorensen and Cline 87] William E. Lorensen and Harvey E.

Cline. “Marching Cubes: A High Resolution 3D Surface
Construction Algorithm.” In SIGGRAPH ’87: Proceedings of
the 14th Annual Conference on Computer Graphics and
Interactive Techniques, pp. 163–169. New York: ACM Press,
1987.
[Monaghan 88] J. J. Monaghan. “An Introduction to SPH.”

Computer Physics Communications 48 (1988), 89–96. Available
at http://dx.doi.org/10.1016/ 0010-4655(88)90026-4.
[Mu¨ller et al. 03] Matthias Mu¨ller, David Charypar, and

Markus Gross. “Particle-Based Fluid Simulation for
Interactive Applications.” In Proceedings of the 2003 ACM
SIGGRAPH/Eurographics Symposium on Computer Animation, pp.
154–159. Aire-la-Ville, Switzerland: Eurographics
Association, 2003.
[Mu¨ller et al. 07] Matthias Mu¨ller, Simon Schirm, and

Stephan Duthaler. “Screen space meshes.” In SCA ’07:
Proceedings of the 2007 ACM SIGGRAPH/Eurographics Symposium
on Computer animation, pp. 9–15. Airela-Ville, Switzerland:
Eurographics Association, 2007. [Ruth 83] Ronald D. Ruth.
“A Canonical Integration Technique.” IEEE Transactions on
Nuclear Science 30 (1983), 2669–2671. [Stam 99] Jos Stam.
“Stable Fluids.” In SIGGRAPH ’99: Proceedings of the 26th
Annual Conference on Computer Graphics and Interactive
Techniques, pp. 121–128. New York: ACM
Press/Addison-Wesley, 1999. [Teschner et al. 03] M.
Teschner, B. Heidelberger, M. Mueller, D. Pomeranets, and
M.Gross. “Optimized Spatial Hashing for Collision Detection
of Deformable Objects.” In Proceedings of Vision, Modeling,
Visualization VMV’03, pp. 47–54. Heidelberg: Aka GmbH,
2003. Available at http://graphics.ethz.ch/ ∼
brunoh/download/ CollisionDetectionHashing VMV03.pdf. [van
der Laan et al. 09] Wladimir J. van der Laan, Simon Green,
and Miguel Sainz. “Screen Space Fluid Rendering with
Curvature Flow.” In Proceedings of the 2009 Symposium on
Interactive 3D Graphics and Games, pp. 91–98. New York: ACM
Press, 2009. [van Kooten et al. 07] Kees van Kooten, Gino
van den Bergen, and Alex Telea. “Point-Based Visualization
of Metaballs on a GPU.” In GPU Gems 3, edited by Hubert
Nguyen, pp. 123–156. Reading, MA: Addison-Wesley, 2007.
[Vesely 01] Franz J. Vesely. Computational Physics: An
Introduction, Second edition. New York: Springer, 2001.
[Zhang et al. 08] Yanci Zhang, Barbara Solenthaler, and
Renato Pajarola. “Adaptive Sampling and Rendering of Fluids
on the GPU.” In Proceedings of the IEEE/EG International
Symposium on Volume and Point-Based Graphics, pp. 137–146.
Aire-la-Ville, Switzerland: Eurographics Association, 2008.
7 Parallelizing Particle-Based Simulation on Multiple
Processors Takahiro Harada 7.1 Introduction Particle-based
simulation is a method that can simulate liquid without
having to use any numerical techniques to track the fluid
surfaces. Simulating particle motion gives us not only the
information about the fluid surface but also about
splashes. Moreover, a particle-based method can be used for
a simplified rigidbody simulation as well [Harada 07], and
since they can be solved in the same framework, the
rigid-body simulation can be coupled with the fluid
simulation easily. Figure 7.1. Rendered image from a
simulation using multiple GPUs (see Color Plate IV). 155
However, the drawback of particle-based simulation is its

computational cost.
If the resolution of the simulation is the same as for a

grid-based simulation, i.e.,
the number of particles are the same as the number of grid

points in a grid-based
simulation, particle-based simulations of fluids are much

more expensive because
the neighboring particles have to be searched in every time

step. In order to get
good visual quality, a large number of particles have to be

simulated. It depends
on the situation, but a simulation with only thousands of

particles does not usually
give us a satisfactory result.
In this chapter, a method to parallelize particle-based

simulation on multi
ple processors with distributed memory is presented. The

method simulates the
motion of particles by splitting a simulation into smaller

simulations. Using this
method, a high-resolution simulation, as shown in Figure

7.1, can be simulated
in a few milliseconds per step. GPUs are generally used for

parallelizing simu
lations, but the present method is not limited to GPUs, as

it is also applicable to
multiple CPUs.
7.2 Dividing Computation
To utilize multiple processors for a simulation, the

computation has to be divided
into several computations. For a grid-based fluid

simulation, in which connec
tivity among fixed simulation entities is parallelized on

multiple processors, the
approach we should take is obvious. The simulation domain

is divided into sub
domains, and a subdomain is assigned to a processor.
Because of the fixed con
nectivity, the decomposition of the simulation domain has

to be done once before
the simulation starts. To calculate each subdomain, the

simulation requires some
data from an adjacent subdomain. The elements whose data

have to be transferred
to an adjacent processor are fixed. Therefore, it is

relatively easy to use multiple
processors for a grid-based fluid simulation. The overhead

of the parallelization
is not so large because of the fixed connectivity.
Particle-based simulation, the analogy of the domain

decomposition for grid
based simulation, involves dividing particles into sets

equal to the number of pro
cessors. We quickly realize that this is not a good choice,

because particles mix up
soon after a simulation starts so that the communication

among processors would
almost halt the simulation. Thus, it is not obvious how to

divide a particle-based
simulation in which simulation entities, particles, move

freely in the computation
domain on multiple processors. The overhead of

parallelization can easily kill
the benefits of using multiple processors without a

carefully designed method,
because the simulation data have to be managed at each

simulation step. We chose to use domain decomposition,
which is often used in grid-based simulation, for
particle-based simulation instead of splitting the
particles by their indices. A processor assigned to a
subdomain simulates the particles in the subdomain. At
first, particle motions are ignored for simplicity. Their
motion will be taken into account in the next section. We
first have to consider how to store the particle data. The
simplest way would be by employing server–client-type
management, in which a server processor containing all the
data distributes jobs with data to client processors and
retrieves the results in each step. Although this is easy
to implement, it requires a large data transfer. This is
not efficient when the data transfer between processors is
expensive, as with GPUs. Moreover, the clients have to wait
while the server is preparing the data to be sent.
Therefore, we used another strategy to manage the data that
is better suited for parallelizing on multiple processors,
and in which each processor manages its own data:
peer-to-peer–type management. To calculate the physical
values of a particle, the values of neighboring particles
are used: positions of neighbors are used to calculate
forces using a distinct element method (DEM) simulation
[Mishra 03]; physical values of neighbors are integrated in
a smoothed particle hydrodynamics (SPH) simulation (see
Chapter 6). Neighbors can be in the adjacent subdomain
computed by another processor. In this case, the processor
has to ask for the data from the adjacent processor.
Accessing the memory of another processor whenever it is
necessary, is inefficient because it lowers the granularity
of the memory transfer when it is smaller and more
frequent. Therefore, we introduce ghost regions to the
simulation—the entire computation domain is C = {x|s < x ≤
e}, and two processors p 0 and p 1 are used for the
simulation. The domain is decomposed at x by a plane
perpendicular to the x-axis, so the subdomains for p 0 and
p 1 are C 0 = {x|s < x ≤ m}, C 1 = {x|m < x ≤ e}, where m =
(s + e)/2 is the midpoint of the computation domain in the
xdirection. Then, the ghost region for p 0 is the area in C
1 adjacent to C 0 , so G 1→0 = {x|m < x ≤ m + g}, s m e g G
g 0→1 C 1 C 0 G C 1→0 Figure 7.2. Division of a simulation
using two processors.
and the ghost region of p 1 is the area in C 0 adjacent to

C 1 : G 0→1 = {x|m− g < x ≤ m},
where g is the size of the ghost region, as illustrated in

Figure 7.2.
When n processors are used, the simulation domain is

divided into n domains,
and each processor (except for the ones at either end) have
two ghost regions, one
on each side. Let the effective radius (particle diameter
in the case of DEM) be
r e = g; then the particles that can be the neighbors of

the particles in C 0 can
be found in the area C 0 ∪ G 1→0 . Thus, a processor does

not have to query for
particle values kept by adjacent processors during the

computation if the particle
data in the ghost region is transferred before the time

step (to be precise, this
is true for explicit computation but not for implicit

computation, like the moving
particle semi-implicit (MPS) method, which solves Poisson’s

equation of pressure
on particles [Koshizuka and Oka 96]). We refer to these

particles in a ghost region
as ghost particles. Processor p 0 updates the particles in

C 0 but only reads the
values of ghost particles. All the particles are updated

because all particles exist
in C 0 ∪ C 1 without any duplications (G 1→0 ⊂ C 1 and G

0→1 ⊂ C 0 ). If particles
were static, this would be sufficient—but particles move.

In the next section, data
management for moving particles is discussed. 7.3 Data

Management without Duplication The motion of particles
causes a flow of particles between subdomains; some
particles go to and some particles come from an adjacent
subdomain. The ghost particles at a time step can change
dynamically, so efficient management of particles is
necessary. As discussed above, we have employed
peer-to-peer–type management of particle data. Although we
chose it, there are still several other choices for how to
manage data. The easiest way is as follows: each processor
has the data of all the particles (using the same index for
each particle) and updates the data of the particles
belonging to its particular subdomain. However, this is not
memory efficient, because all the processors have to have
all the particle data. In the following subsections, we are
going to describe a method in which a processor only keeps
the data of particles in its own subdomain. Therefore,
there is no processor that holds the data of all the
particles. 7.3.1 Sending Data As discussed above, data from
a neighboring processor is necessary for the computation of
particles at a boundary of a subdomain. Also, particles
that move out of a subdomain have to be passed to an
adjacent processor. Therefore, the particles that have to
be sent to an adjacent processor are the particles that
move from their subdomain to an adjacent subdomain and also
the ghost particles in the subdomain. Let x t i be the
x-coordinate of particle i at time t calculated by
processor p 0 , which calculates subdomain C 0 . Particle i
is in the subdomain of p 0 if x t i < m. The particles that
move out from C 0 to C 1 are EP t+Δt 0→1 = {i|m < x t+Δt i
, x t i ≤ m}. (7.1) The ghost particles of p 1 in the
subdomain of C 0 that come from p 0 are GP t+Δt 0→1 = {i|m−
g < x t+Δt i ≤ m,x t i ≤ m}. (7.2) C 0 G 0→1 GP 0→1 EP 0→1
t+∆t t+∆t Figure 7.3. Particles sent from p 0 to p 1 .
Note that this does not include the ghost particles of p 1

from p 1 . From Equa
tions (7.1) and (7.2), the particles that have to be sent

to p 1 are SP t+Δt 0→1 = EP t+Δt 0→1 + GP t+Δt 0→1 = {i|x
t+Δt i > m− g, x t i ≤ m},
as shown in Figure 7.3.
To send the data, SP t+Δt 0→1 has to be selected from all

the particles in the mem
ory of a processor. Flagging particles in the region and

using prefix sums, which
is often used in algorithms on the GPU [Harris et al. 07]

to compact them to a
dense memory, adds some computation cost which may seem

negligible, but not
for high-frequency applications like our problem. Most of

the processors have
to select two sets of particles on each sides for two

neighbors if more than two
processors are used. This means we have to run these

kernels twice.
Instead, the grid constructed for efficient neighbor search
is reused to select
the particles in our implementation. The data can be

directly used to select the
particles so that we can avoid increasing the cost. The

particles that have to be sent
to C 0 are particles in voxels with x v > m− g. However,

the grid constructed in
this simulation step cannot be used directly because

particles have changed their
positions in the time step. To avoid the full build of the

grid, we used a simulation
condition to restrict the particles we want to find. We

used the distinct element
method (DEM) to calculate force on a particle by placing

springs and dampers.
DEM is an explicit method but is not unconditionally

stable. It has to restrict the
size of the time step according to the velocity to maintain

stability. Thus, we need
vΔt/l 0 < c, where v,Δt, l 0 , and c are particle velocity,

time-step size, particle
diameter, and Caulant number, respectively, which have to

be less than one. This
condition guarantees that the motion of any particle is

below its diameter. Since
we set the side length of a voxel equal to the particle

diameter, particles do not
move more than l 0 , which is the side length of a voxel.

From these conditions,
we find SP t+Δt 0→1 in simulation time t are the particles

(let the x-coordinate of this
be x) in S ′ 0→1 = {m− d− l 0 < x ≤ m},

and especially when d = r e , S ′ 0→1 = {m− 2d < x ≤ m}.
A buffer has to be prepared to store these selected

particles. When a uniform
grid is used and g = r e , two voxel widths in the

direction of the space split have
to be sent. The buffer size can be calculated from the

configuration of the grid. t+∆t GP 0→1 0 0 EP 0→1 t+∆t t+∆t
t mm-g m gg l m-g-l Figure 7.4. Particles to be sent at t +
Δt (left) and their configuration at t (right). Actually,
we are not using a uniform grid but rather a sliced grid,
which has a much tighter fit to particle distribution, as
will be described in Section 7.4. 7.3.2 Receiving Data If
all the processors are using the same indices for
particles, all we have to do is update the values of these
particles after receiving the data from other processors.
However, in our approach, each processor manages its own
data and does not have a unique index for a particle in all
of the particles of the simulation. Thus, the index of a
particle at a boundary of a subdomain does not necessarily
agree between the two processors sharing the boundary. When
one processor receives particles from another, it adds them
to its own particle list. We have to be careful about the
duplication of particles. If we cannot guarantee that the
particle sent from a neighbor does not already exist in the
list, all the particles have to be scanned to find the
entry—something we do not want to do. However, what we have
to do is delete the particles in the ghost region that were
received in the previous time step. For example, p 0
received a set of particles from p 1 at time t. The set of
particles consists of particles in x t+Δt > m and x t+Δt ≤
m. So after deleting particles in x t+Δt > m, only
particles in x t+Δt ≤ m remain. Note that this is not the
same as the set of particles in the ghost region after
updating the particle positions. This can be proved by the
following two propositions: 1. A set of particles GP t 1→0
that is in G 1→0 at time t is included in the set of
particles SP t+Δt 1→0 , which will be sent from the
adjacent subdomain at time t + Δt (see Figure 7.4).
2. A set of particles EP t+Δt 0→1 that is in C 0 at time t

and will be in G 1→0 at time t + Δt will not be included in
SP t+Δt 1→0 .
For the first proposition,because SP t+Δt 1→0 is created by

reading the grid at
time t, SP t+Δt 1→0 = {i|m < x t i ≤ m + d + l 0 }, GP t
1→0 = {i|m < x t i ≤ m + d, x t−Δt > m}.
These equations lead to GP t 1→0 ⊂ SP t+Δt 1→0 , which

proves that ghost particles
at time t will be sent from the neighbor at time t+ Δt.

Therefore, ghost particles
at time t have to be deleted before the processor receives

the particles coming
from adjacent subdomains.
For the second proposition, EP t+Δt 0→1 = {i|m− d < x t i ≤

m,m < x t+Δt },
and SP t+Δt 1→0 = {i|m < x t i ≤ m + d + l 0 }.
These equations lead to EP t+Δt 0→1 /∈ SP t+Δt 1→0 . Thus,

particles in a ghost region at
time t+Δt should not be deleted. We can also see that the
particles coming from
a neighbor have no duplication of particles in its

subdomain. So the received data
can just be added at the end of the particles of the

processor.
If a grid is used to select the particles to be sent, there

are several voxels that
are not fully saturated to the maximum capacity of a voxel.

If sent data kept being
added, invalid entries would accumulate. To prevent this,

the array is compacted
by using a prefix sum after receiving neighbors.
7.4 Choosing an Acceleration Structure
So far, we have discussed how to manage the data on

multiple processors. As
neighboring-particle search is expensive, acceleration data

structures have to be
introduced. In this section, we first discuss the
requirements for a particle-based
simulation and then present the sliced grid, which we used

for our simulation. It
not only has several advantages as an acceleration

structure, but is also well suited
for parallelized particle-based simulation using domain

decomposition. 7.4.1 Requirements for Particle-Based
Simulation The data structures introduced to make
neighboring-particle search efficient are classified into
three categories: uniform grids, hash grids, and
hierarchical grids, all illustrated in Figure 7.5. There
are two major requirements for a grid used in
particle-based simulations. The first is that the
construction cost is low enough to be reconstructed at
every time step. The other requirement is that it should be
easy to access the memory of the voxel to which a particle
belongs, because the data stored in the memory is
frequently referred to in a simulation. There is actually
another condition—although it is not necessarily required,
but is preferable—a smaller memory footprint. In the
following, we discuss these points in the three grids:
uniform grid, hash grid, and hierarchical grid. The uniform
grid allocates the memory for all the voxels in the
computation domain whether it is occupied by particles or
not. This simple nature keeps the construction and access
costs low. Although the uniform grid satisfies the two
requirements, it needs a large memory to hold the data for
all the voxels in the computation domain. There can be a
large number of empty voxels, storing no particles, which
is nothing but a waste of memory. The hash grid improves on
the uniform grid by not allocating all the voxels. Instead,
it maps the voxels to a fixed-sized array by using the hash
function. It looks to be a good candidate, but it suffers
from hash collision, in which several voxels are mapped to
the same location because the hash grid cannot guarantee a
perfect hash. When the grid is accessed, the stored values
have to be checked to see whether they are in the same
voxel or not. So the access cost is more than it is in the
uniform grid. The hierarchical grid improves the memory
efficiency a lot. Figure 7.5 (right) shows a quad tree
(correspondence to the three-dimensional case is octree),
which divides a cell with a valid entry. The top level of
the tree is the bounding box of the input data. A node with
an entry will be divided into four nodes; this is done
recursively when the criteria are met. It avoids allocation
of memory for empty space by using hierarchical
representation. The drawback of the hierarchy is the access
cost of a leaf node. Unlike the uniform grid, it cannot
calculate the memory address directly from the position of
the query. Instead, it has to traverse the tree structure
from the root of the tree. We now parallelize
particle-based simulations on multiple processors, which
has some additional requirements. One requirement is that
all the computations are parallelized. Especially when
using a GPU, the entire algorithm should be performed on
the GPU; otherwise, data have to be transferred between the
GPU and the CPU. Another consideration is that a uniform
computation burden is preferred to keep the load balance
uniform. To summarize this discussion, a uniform
Figure 7.5. Uniform grid (left), hash grid (middle), and

hierarchical grid (right). The uni
form grid allocates memory for an entire domain. The hash

grid maps a voxel to a memory
array using the hash function. The hierarchical grid only

subdivides voxels containing
particles.
grid is memory inefficient and a hash grid is not suited

for implementation on the
GPU because of the hash collision. Construction of a grid

and accessing a voxel
is computationally expensive in a hierarchical grid.
The sliced grid, developed by [Harada et al. 07], is

another option. This is
a grid whose construction cost is low, has easy access to a

voxel, and requires a
small footprint. So we chose the sliced grid for the

acceleration structure for our
neighboring-particle search. In the following, a short

introduction of the sliced
grid is presented, followed by an implementation on the GPU

using CUDA.
7.4.2 Sliced Grid

When a uniform grid is used, a bounding box is defined to
enclose the computa
tional domain, and memory for the voxels inside of the

bounding box is allocated
whether a voxel is occupied or not by a particle, as shown

in Figure 7.6 (left). We
can see that a large amount of memory is wasted because it

allocates memory for
unused voxels. However, the sliced grid allocates memory,

as shown in Figure
7.6 (right). The procedure to build a grid starts by

scanning the space for the grid
cells filled with particles. Of course, it is possible to

identify voxels containing
particles by scanning the whole space, but there is a cost

for that. The sliced grid
increases the memory efficiency by adding a little

computation.
First of all, orthogonal basis vectors (e x , e y , e z )

and a uniform grid along the
bases in the computational domain are prepared. Note that

the grid is not allocated
in the memory at this time. The first step is the scanning

of the number of voxels
required to store the data.
An axis is chosen from the bases, and the grid in the

domain is divided into
slices perpendicular to the axis. Each slice has a

one-voxel thickness in the di
rection of the axis. Thus, the slices have one dimension

less than the spatial dimension of the computation domain.
A sliced grid allocates memory for the two-dimensional
bounding boxes for each slice. By not excluding empty
voxels completely, it keeps the computation cost low. When
e x is chosen as the axis, the slice is spread over the
space of the bases e y and e z . The following explanation
assumes that the coordinate in the grid space of a point x
= (x, y, z) is b = (b x , b y , b x ) = (x · e x ,x · e y
,x · e z ). After dividing the computational space into
slices, the bounding box (two-dimensional in this case) for
each slice is calculated by scanning the grid coordinates
of all the particles. The maximum and minimum of y and z of
slice i are B y i,max = max j∈P i {b y j }, B y i,min = min
j∈P i {b y j }, B z i,max = max j∈P i {b z j }, B z i,min =
min j∈P i {b z j }, where P i = {j|b x j = i}. With these
values, the number of voxels in the yand z-directions are
computed as n y i = B y i,max −B y i,min d + 1, n z i = B z
i,max −B z i,min d + 1, where d is the side length of the
voxels. This bounding box with n i = n y i n z i voxels in
a slice is allocated in memory. This is much more efficient
than using the uniform grid, although it still has some
empty voxels. The index of a voxel at (x, y, z) at slice i
can be calculated by v i (x, y, z) = [ (y −B y i,min )/d ]
+ [ (z −B z i,min )/d ] n y i . (7.3) Placing the memory
for slices in a contiguous memory requires the offsets or
the indices of the first voxels of the slices. Let the
index of the first voxel of slice i be Figure 7.6. Uniform
grid (left) and sliced grid(sliced in the x-direction)
(right).
p i . It is calculated as the summation of the number of

voxels from the first slice
S 0 to slice S i−1 . Thus, p i = ∑ i<j n j . Taking the

prefix sum of the number of
voxels in the slices gives us the indices of the first

voxels.
We are now ready to store the data in the grid. The index
of the voxel to
which a point (x, y, z) belongs is calculated in two steps.

The first step is the
computation of the slice the point is on. It can be

calculated by i = [(b x −B x min )],
where B x min is the minimum coordinate of the slices in

the x-direction. By using
the index of the slice, the first voxel of the slice stored
in the table calculated in
the preprocessing step is read. From the index and Equation

(7.3), the index of
the voxel is calculated as follows: v(x, y, z) = p i + ( y

−B y i,min d + z −B z i,min d n y i ) .
Of course, we can push this slicing concept to another

dimension to remove
more empty voxels, i.e., by slicing in the x-direction

before slicing in the y
direction. However, this is a tradeoff between memory

saving and computation;
it adds much more overhead for real-time applications.
Implementation on the GPU. Before storing particle indices

to memory, the
bounding box and the first index of the voxel in each slice
have to be calculated.
Although these computations are trivial on a sequential

processor, it requires some
effort to perform these on multiple processors. The GPU

implementation is ex
plained in the following paragraphs.
Calculating the bounding box and the first voxel index of

every slice is per
formed in several steps. The first step is the computation

of the bounding boxes
in which memory will be allocated. The grid coordinate

calculated from the par
ticle position is inserted in the bounding box of the slice

on which the particle
is located. Although the flexibility of current GPUs makes

the serial version of
the computation possible on the GPU, it cannot exploit the

power of the GPU.
For efficiency reasons, the computation is divided into two

stages. The particles
are divided into several sets, and bounding boxes for these
sets are computed in
parallel. If there are m slices and the particles are

divided into n sets, n sets of
m bounding boxes are calculated in parallel. (Of course,

this is also effective on
other multiple processors.) Then, the results are merged

into a set of bounding
boxes. This reduction step can also be parallelized on the

GPU.
Here we assume that the x-axis is taken as the slicing

axis. Then, what we
have to compute are B y i,max , B y i,min , B z i,max , and

B z i,min , which are the maximum
and the minimum values for the yand z-directions on ith

slice of the x-direction.
Let n and m be the total number of particles and the number

of the small com
putations (we will call them jobs from now on). The ith job
is responsible for particles whose indices are in n/m ≤ a <
(i + 1)n/m. Then the bounding box of the jth slice in the
ith job is B y ij,max = max a∈P ij {b y a }, B y ij,min =
min a∈P ij {b y a }, B z ij,max = max a∈P ij {b z a }, B z
ij,min = min a∈P ij {b z a }, where P ij = {a|b x a = j,
n/m ≤ a < (i + 1)n/m}. One job is processed by a block of
threads on CUDA. Since the bounding values are frequently
read and updated, they can be stored quickly on chip memory
if available. On CUDA, shared memory is used for their
storage. However, the updating of the bounding values has
to be serialized in case of write conflicts among threads.
Therefore, whenever a thread updates a bounding volume, it
has to be locked. This kills the performance when a large
number of threads are running at the same time. To increase
the efficiency of the computation, one job is split into
smaller jobs and threads in a block are also divided into
smaller thread groups. The computation can be much more
efficient because these smaller thread groups calculate
their own bounding volume data by synchronizing fewer
numbers of threads. Figure 7.7 illustrates a three-step
computation of the bounding volumes. We will look more
closely at the implementation of this computation on a
current GPU. Reducing the size of data is a good idea in
most cases because the latency of the memory access is much
higher than are the arithmetic instructions. Also, the chip
resources that can be used in computation are limited. To
maximize the efficiency of the GPU, the register and
shared-memory usage should be kept to a minimum. The size
of the shared memory on an NVIDIA G80 is 16 KB per Particle
Positions Block0 Block1 Block2 1st Step 2nd Step 3rd Step
Figure 7.7. Computation of bounding volumes.
multiprocessor. If eight bits are used for the bounding

values, and assuming there
are 256 cells in each direction at most, a set of bounding

boxes requires 1 KB
of storage. (Of course we can use 32 bits for a bounding

value, but it strains the
local resources and results in less usage of hardware

threads.) Therefore, we can
calculate at most 16 sets of bounding boxes by the same

number of small thread
groups in a block at the same time. The computation of the

bounding volumes is
done by reading particle values from main memory with

aligned memory access
and updating the values using synchronization in the thread

group. This corre
sponds to the first step in Figure 7.7. The next step is

reduction of these sets of
values. These outputs are still placed in shared memory,

and one set of bounding
boxes is calculated from in the same kernel by assigning a

thread to a bounding
box that reads all the bounding values from the smaller
groups. In short, we have
256 slices, 256 threads run at the merge step, and thread i
assigned to the ith slice
compares all the values of the slices from all the small
groups. Then threads write
the values to the global memory at the last merge step.
The last merge step runs in another kernel. This step is

almost the same as the
previous merge except for reading the values off chip

memory this time. Instead
of using a thread to reduce a slice, tree-shaped reduction,

in which n/2 threads
are assigned to a slice (n is the number of bounding boxes)

and reduce two values
to one in a step is used; it has an advantage in

performance. This is the third
step in Figure 7.7. In this way, a set of bounding boxes is

calculated from all the
particles.
When using CUDA for the computation, the number of real

threads running
is not equal to the width of a kernel. In most cases, it is

smaller than the kernel
width. Although we can make the kernel the same block size
as the number of real
threads, increasing the size of blocks makes the

computation much more efficient
because it makes the threads switch between work groups

(like multithreading on
the CPU when a work group is stalled).
Now that we have the bounding boxes for all the slices, the
number of voxels
in a slice is calculated. This computation is tricky when

using shaders, but with
the function of synchronization among threads in a block,

it has become easier.
The prefix sum of the array is calculated in parallel to
get the indices of the first
voxels. For this parallel reduction, a method presented in

[Harris et al. 07] is used.
Figure 7.8 shows how much the sliced grid improves the
memory efficiency
in a test simulation. It compares the memory consumption of

the uniform grid,
octree, and sliced grid in a DEM simulation. We can see

that the sliced grid can
reduce the memory consumption greatly over the uniform

grid, and the efficiency
is close to the octree. Moreover, the cost of accessing the

voxel data is at least as
cheap as the uniform grid, and can be much better, as will

be shown later. Time step M e m o r y U s a g e ( B y t e s
) 0 500000 1000000 1500000 2000000 2500000 3000000 3500000
4000000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37
39 41 43 45 47 49 51 53 55 57 59 61 Memory (Uniform Grid)
Memory (Sliced Grid) Memory (Octree) Figure 7.8. Memory
consumptions when using uniform grid, sliced grid, and
octree (see Color Plate V). 7.4.3 Introducing Sort
Introduction of the sliced grid not only improves the
memory efficiency but also improves the performance thanks
to the dense voxel data,which will be shown later. This
section discusses how to improve the performance from the
perspective of cache efficiency. A simple implementation of
a particle-based simulation is accompanied by random access
of the data. The random-access pattern of the memory
reduces the performance. If the particle data are also
arranged in the order of the spatial distribution of
particles, the cache hit-rate of accessing the particle
data increases as well. However, because particles not
having any fixed connectivity move freely, the memory
location of spatially close particles becomes random as the
simulation proceeds. This reduces the memory locality and
results in the slowdown of a simulation. An idea to improve
the simulation performance is to sort the particle data by
the spatial order of particles. We have to be careful in
the selection of the sort algorithm used, especially for
real-time applications, because the speedup from the
ordering of the particle data has to be greater than the
cost of the sorting. Otherwise, it just slows the
simulation down. Researchers have been studying sorting on
the GPU. However, the best algorithm for sequential
processors is not always the best for parallel processors.
For example, quick sort, which is one of the most efficient
sorts on the CPU, does
not perform well on the GPU. Instead, sorting networks,

such as bitonic merge
sort, are preferred because of their parallel nature

[Kipfer and Westermann 05].
However, the drawback of sorting networks is that they

require lots of passes to
complete the sorting. Recently, the functionality of the

GPU has made it possible
to implement radix sort, which requires fewer passes [Grand

07].
Although the radix sort runs quickly on the GPU, the

sorting cost of the radix
sort is prohibitively expensive for the sole purpose of

improving cache efficiency.
Actually, it took more than the computation of one step on

DEM simulation in
our experiment. So it does not meet our goal. There are

several sorting algo
rithms suited for a nearly sorted list, such as insertion

sort. They are good for
situations with temporal coherency between frames, like our

simulation. But the
problem here is that an insertion sort is a completely

sequential algorithm, which
is not good for multiple processors, such as GPUs. But what

we want is not a
completion of a sort in a frame because the sort is used

just to increase the spatial
coherency of the data. Even if a sort in a time step

improves the order of the lists
more or less, it would improve the cache efficiency.
7.4.4 Block Transition Sort
Among sorting networks, we have chosen an odd-even

transition sort, a sorting
network that completes a sort by repeating two simple

operations: comparing ad
jacent odd-even index pairs and flipping them if they are

in the wrong order, then
comparing adjacent even-odd index pairs. If blocks with an

arrow in Figure 7.9
are thought of as two adjacent elements, it shows how the

sorting works. Odd
even transition sort is good for a nearly sorted list but

is pretty poor when it is
applied to a random list. If only two adjacent elements are

flipped, it can com
plete the sort in one or two steps. But if they are

arranged in the reverse order, it
takes n steps to move them to the correct order.
1st pass
2nd pass
3rd pass
4th pass
Figure 7.9. Block transition sort. An array is divided into

blocks, and two adjacent blocks
are sorted in a pass. A 0 A 1 B 0 B 1 C 0 C 1 D 0 D 1 A 0 B

0 A 1 B 1 C 0 D 0 C 1 D 1 A 0 B 0 C 0 D 0 A 1 B 1 C 1 D 1 A
0 A 1 B 0 B 1 C 0 C 1 D 0 D 1 A 0 B 0 A 1 B 1 C 0 D 0 C 1 D
1 A 0 A 1 B 0 B 1 C 0 C 1 D 0 D 1 Figure 7.10. Bitonic
merge sort. Thread A compares A 0 and A 1 , and so on. We
generalize the idea of an odd-even transition sort to
develop our block transition sort, which is suited for
architectures like the GPU that have a fast local memory
for a set of threads. Instead of comparing two adjacent
index pairs, it compares two adjacent blocks consisting of
several elements. Precisely, it sorts two adjacent blocks
in a step. Figure 7.9 illustrates how the block transition
sort works. Block transition sort is good for a GPU, which
has fast local memory for each processor, because
partitioning the computation into small problems lets
threads sort two adjacent blocks only on the fast local
memory without writing back to the slower global memory.
Also, the memory-access pattern is preferable, because all
the random access can be done on the chip memory so that
all the read and write operations can be aligned memory
accesses. In our implementation, we used bitonic merge sort
for sorting two adjacent blocks. As shown in Figure 7.10,
bitonic merge sort always compares n/2 sets of entries in a
pass, where n is the total number of elements. So n/2
threads are executed, and each of them reads two elements
to shared memory. Then it repeats comparison and
synchronization until sorting is done. It is important to
set the size of a block such that two blocks can fit in
shared memory. If we have more budget for the sorting, the
two adjacent sorted chunks of data could be merged by using
merge sort to make it much more efficient. 7.4.5
Performance Figure 7.11 shows a simulation that sorts
particle values. A box half-filled with particles is
rotated. To make the effect of sorting illustrative,
particles are colored by their indices. We can see that
these colors do not mix up, although particles are mixed
up. This is because of the renumbering of particles. The
simulation times on a GPU are shown in Figure 7.12. The
figure shows total computation time of a Figure 7.11.
Simulation result with sorting(see Color Plate VI). 0 2 4 6
8 10 12 14 16 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76
81 86 91 96 101 106 111 116 Update (UG) Update (S) Update
(SS) Total (UG) Total (S) Total (SS) Simulation time C o m
p u t a t i o n t i m e s ( m s )
Figure 7.12. Comparison of computation times of simulations

using uniform grid (UG),
sliced grid (S), and sliced grid with sorting (SS) (see
Color Plate VII).
time step and update of the particle values. When sorting

is used for a sliced grid,
particles are sorted at the update. Concretely, particle

indices are sorted using the
grid coordinates as keys. Then updated velocities and
positions are written to the
new memory locations. This timing also includes the time

for sorting. We can see
that total time with sorting does not spike when sorting is
introduced, although
update with sorting does takes some time. 7.5 Data Transfer
Using Grids The sliced grid in which the computation domain
is sliced by the x-axis is used by the acceleration
structure to search for neighboring particles. The data
that have to be sent to an adjacent processor are two
contiguous slices when the side length of voxels equals the
particle diameter. Generally, the data to be sent is
smaller than using a uniform grid although efficiency
depends on the distribution of particles. Sending the data
between GPUs cannot be done directly at the moment.
Therefore, the data have to be sent via main memory in two
steps: first, send the data from a GPU to main memory, then
the second GPU reads the data from main memory. Because the
neighbors of GPUs do not change, the destination of the
memory to which a GPU writes the data and the memory a GPU
reads from is defined at spatial decomposition. Figure 7.13
illustrates how this works when using four GPUs for a
simulation. Each GPU computes a subdomain, and they each
have one or two ghost regions. After the computation of a
time step, all the GPUs send the data to the predefined
location in main memory. GPUs at both ends write particle
data to one buffer and other GPUs write to two buffers. To
make sure that all the data are ready, all of the threads
are synchronized after the send. Then the reading from the
defined memory location finishes the transfer. As you can
see, these threads run completely in parallel except for
one synchronization per time step. Subdomain0 Subdomain0
Processor0 Processor1 Processor2 Processor3 Subdomain1
Subdomain2 Subdomain3 Subdomain1 Subdomain2 Subdomain3 Main
Memory Send Receive Figure 7.13. Overview of a simulation
using four GPUs.
7.6 Results
Our method was implemented using C++ and NVIDIA CUDA 1.1 on
a PC
equipped with an Intel Core2 Q6600 CPU, a GeForce8800GT

GPU, and Tesla
S870 GPUs. The program executes five CPU threads: it uses

one GPU for render
ing and the other four GPUs for the computation of the
particle-based simulation.
A CPU thread managing a GPU executes kernels for the GPU.
Figure 7.14 is a comparison of the computation times of the

simulation shown
in Figure 7.15, changing the number of GPUs. A simulation

using one mil
lion particles takes about 95 ms for a simulation step

using one GPU, while the
same simulation takes about 40 ms and 25 ms on two GPUs and

four GPUs,
respectively. Although these timings include the management

of particles and the
data transfer time between GPUs, they are nearly scaling to

the number of proces
sors. The efficiency of parallelization decreases for the

simulation on four GPUs
compared to the simulation on two GPUs. This is because it

is necessary to com
municate with one other GPU when using two GPUs, but
communication with
two adjacent GPUs is necessary when using four GPUs. The

timing, excluding
the time for data transfer, are also shown in Figure 7.14.
These time only ex
clude the actual data transfer between processors but

include the time to manage
data. From this figure, we can see that the overhead of

data management is small
enough that the performance is scaling well to the number

of GPUs. Number of Particles C o m p u t a t i o n T i m e
s ( m s ) 0 10 20 30 40 50 60 70 80 90 100 0 200,000
400,000 600,000 800,000 1,000,000 1,200,000 1GPU 2GPUs
(excludingTransfer) 2GPUs 4GPUs (excludingTransfer) 4GPUs
[Kipfer and Westermann 05] Peter Kipfer and Ru¨diger

Westermann. “Improved GPU Sorting.” In GPU Gems 2, edited
by Matt Pharr, pp. 733–746. Reading, MA: Addison-Wesley,
2005.
[Koshizuka and Oka 96] S. Koshizuka and Y. Oka.

“Moving-Particle SemiImplicit Method for Fragmentation of
Incompressible Flow.” Nucl. Sci. Eng. 123 (1996), 421–434.
[Mishra 03] B. K. Mishra. “A Review of Computer Simulation

of Tumbling Mills by the Discrete Element Method: Part
I—Contact Mechanics.” International Journal of Mineral
Processing 71:1 (2003), 73–93.
4 - IV - Constraint Solving
transposed. The matrix vector multiplication is Ax.

Quaternions q ∈ H behave
as vectors with respect to addition and scalar

multiplication. What makes them a
useful algebra is the product operation, which we write as

qp. We will use the
right-hand convention for this, as is defined in Chapter 1.

A three-dimensional
vector x can be promoted to a quaternion p = q(x) by

writing p s = 0 and
p v = x. That is a purely imaginary quaternion since p † =

−p. If I forget to
tell you in a particular set of equations, I always use u,

v, and n to denote right
the exact correspondence.
Credit where credit is due. My initial inspiration came

after reading [Ser
ban and Haug 98] and [Haug 89]. I then found results
similar to what is below
in [Tasora and Righettini 99]. The matrix formulation of

quaternion algebra is
already in the graphics literature [Shoemake 91, Shoemake

10] but is not widely
used. There is a whole chapter about details of this matrix

representation in my
PhD thesis [Lacoursie`re 07a] for those who may be

interested.
And now, let’s begin.
9.3 The Problem
Rotational constraints between rigid bodies are problematic

when they are defined
using dot product indicators. This makes them bistable
since obviously x · y = 0
implies that −x · y = 0 as well. Take, for instance, the

rotational part of a
hinge joint between bodies 1 and 2 that have right-handed

orthonormal frames
defined with u (1) ,v (1) ,n (1) and u (2) ,v (2) ,n (2) ,

respectively. Taking n (1) as
the normal axis of the hinge attached on body 1, the hinge

indicator is defined as
the set of the two conditions n (1) · u (2) = 0 and n (1) ·

v (2) = 0. (9.1) Figure 9.1. The hinge definition. When
these are satisfied, the vectorn (1) has no projection in
the u (2) –v (2) plane, as shown in Figure 9.1. The content
of the constraint is that n (1) and n (2) are both normal
to the u (2) –v (2) plane, which means they are parallel,
and thus, by transitivity, n (2) is a normal to the u (1)
–v (1) plane as well. But the indicator function in
Equation (9.1) is satisfied simultaneously for both n (1) =
n (2) and n (1) = −n (2) , i.e., the antiparallel case. But
we usually want the first of the two options. This is shown
below in Figure 9.2. It is possible to flip between one and
the other by wrenching the two bodies hard enough,
irrespective of our numerical method of choice. In
addition, the constraint weakens as it gets further and
further away from the desired configuration. It is, in
fact, metastable when vector n (1) lies in the u (2) –v (2)
plane since the Jacobian vanishes there, and so it might
stabilize either the right way or the wrong way. That makes
them easy to flip since the constraint force starts to
weaken at π/4, and it starts to point the wrong way after
π/2. We could avoid such headaches using reduced coordinate
formulations, as is common in robotics, but that will cause
other types of pain. As an aside, we might think that the
indicator n (1) · n (2) = cos θ = 0, which is a single
equation, is equally good as the two equations in Equation
(9.1). The problem is that this single equation is, in
fact, quadratic, i.e., it behaves as θ 2 near θ = 0, which
means that the Jacobian vanishes. The remedy to that is to
construct indicators with a unique zero, and this can be
done using quaternions. These indicators have extreme
values ±1 precisely when one of the normal vectors used in
the dot product definition is flipped by 180 ◦ . One
problem remains though. The Jacobians still vanish at the
maximum constraint violation, and that means they weaken on
the way there. It is possible to add nonlinear terms to the
indicator functions to fix this problem. But that’s n (1) u
(1) v (1) n (2) u (2) v (2) n (2) u (2) v (2) ¯ ¯ ¯ Good!
Bad! Figure 9.2. Axis flip.
beyond our scope here, and I think we can manage better

with good logic code to
catch the problem cases.
The theory below is an overkill, but the results are easy

to implement and not
much more expensive computationally than the standard dot

product versions.
Three constraints are analyzed in detail, namely, the lock

joint, the hinge joint,
and the homokinetic joint. This last one is also known as

the constant velocity
joint, CV for short. It is much like the Hooke or universal

joint but without the
problems. The Hooke joint is easy to define as a bistable

constraint in dot prod
uct form. It seems that it is not possible to define a

monostable version without
introducing a third body that is hinged to the other two.

If we look at a good
diagram and animations of the Hooke joint [Wikipedia 10b],

we will see clearly
why a third body is needed. But more to the point, the CV

joint is the one we see
in our front traction cars, since otherwise, the wheels

would not move at constant
rotational velocity. Curiously, though it is an engineering

puzzle to construct a
CV joint [Wikipedia 10a] that is not fragile, it is dead

easy to define the geome
try using quaternions. A homokinetic joint can be

constructed using two hinges,
and this makes the analysis much more complicated [Masarati

and Morandini 08]
than the quaternion definition given below.
These three rotational joints are used in combination with

positional con
straints to produce all other joints, namely, the “real”

hinge, the prismatic of the
sliding joint that requires the full lock constraint, the

cylindrical joint that requires
the hinge constraint, etc. A robust Hooke joint can also be

built out of three bodies
using two hinges.
In what follows, I will first explain the indicators

themselves by looking at
special quaternions and the geometry of the resulting

kinematics. Then, I will
explain how to construct the Jacobians for these.
9.4 Constraint Definitions
It is enough to consider just one quaternion q describing

the orientation of one
rigid body with respect to the inertial frame to start

with. This is because, in the
end, the quaternion used in the constraint will be the

relative rotation going from
body 1 to body 2. That will simplify things and save our

time. Also note that in
this first stage, I assume that both our hinge and CV axes
are aligned along z in
each body. Generalizations are provided below.
The quaternion that corresponds to no rotation at all is

just the unit quater
nion, i.e., q s = 1, q v T = [0, 0, 0] T . The indicator is
easy to define here: c lock = q v = Pq = P lock q = 0,
(9.2) where P = P lock is the projection operator P = ⎡ ⎣ 0
1 0 0 0 0 1 0 0 0 0 1 ⎤ ⎦ so that Pq = q v . There is still
an ambiguity since the constraint is satisfied by both ±q.
But that is of no consequence since both cases correspond
to a unit rotation. Remember that quaternions cover the
rotation group twice. The lock constraint is thus a simple
linear projection of the relative quaternion. That will
hold for all the other constraints. The hinge constraint
requires that the original and transformed frame share a
common axis. This is set to the axis z arbitrarily, and
thus the allowed rotations have the form q s = cos(φ/2) and
q v = [0, 0, sin(φ/2)] T , (9.3) which gives the two
equations we want: c hinge = [ x · q v y · q v ] = [ x T y
T ] Pq = P hinge q = [ 0 0 ] , (9.4) where P hinge = [ x T
y T ] P = [ 0 1 0 0 0 0 1 0 ] is the hinge projection
operator. We’ll see in Section 9.8 how to define this for
axes other than z. And now comes the CV joint. The
kinematic constraint we want to create here is such that
the rotational motion along the axis n (1) of an object
produces an identical rotation about the axis n (2) of
another. That is precisely the relationship between the
plate of a turntable and the disc sitting on it, although
these two objects share the same longitudinal plane. But
the idea is the same: we want a driver that produces a
constant rotational velocity in a secondary body about some
axis fixed in that body. Let’s now visualize a perfect CV
joint using two pens with longitudinal axes n (1) and n (2)
, respectively, each with a longitudinal reference line
drawn on the θ′ θ ø (1) ø (2) Figure 9.3. An illustration
of the CV coupling.
circumference. Hold the pens 1 and 2 in your left and right

hands, respectively,
and align the axes and the reference lines so that they
face up. Now, rotate pen 2
by some angle θ about the vertical axis z away from you.

Choosing θ ≈ 45 ◦ will
make things obvious. The two pens lie in the horizontal

plane, with an angle θ
between n (1) and n (2) . Now, realign the two pens and
rotate them about their
common longitudinal axes by 90 ◦ . Keep the reference lines

aligned but make
them face you. Then rotate pen 2 by the same angle θ as

before about the axis
z. Clearly, the axis of rotation r is still perpendicular

to n (1) but is not the same
as before. If you had done this in small increments, you

would have seen the CV
joint at work. You would probably scratch your head

wondering how you would
actually construct something that worked like that. You can

even change the angle
θ as you move along, keeping perfect alignment between the

reference lines. One
thing is constant though: relative rotation between pens 1

and 2, as seen from
pen 1, is about an axis r that is perpendicular to n (1) .

This axis r is not fixed,
however. This is what I’ve sketched in Figure 9.3.
Let’s get rid of all the indices now. The conclusion from
the experiment above
is that a rotation by any angle θ about any axis r such

that r·z = 0 always, will not
rotate the transformedx ′ –y ′ plane about the transformed

axis z ′ . Mathematically,
this implies that the relative quaternion q satisfies q s =

cos(θ/2) and q v = sin(θ/2)r, where r · z = 0. (9.5)
Therefore, c CV = z · q v = z T Pq = P CV q = 0, (9.6)
where P CV = z T P = [0, 0, 0, 1]. Now that we have

constraint definitions, we need Jacobians. But to get that
right, I need to tell you a bit more about how I manipulate
quaternion expressions. 9.5 Matrix-Based Quaternion Algebra
The quaternion algebra is covered in Chapter 1, so this
section is just a simple translation into language I find
useful. The format I use here should help you implement
what is described in this chapter. First, note that any
quaternion q,p ∈ H can be represented as a simple
fourdimensional vector. That works for addition and
subtraction, obviously. The only thing needed to make the
correspondence complete is to define the quaternion product
in terms of matrix-vector operations, as I do now. Since
the quaternion product of q,p ∈ H is linear in both q,p ∈ R
4 , we can write it as the matrixvector product qp = Q(q)p
= P(p)q, corresponding to the right and left products,
respectively, with the definitions Q(q) = ⎡ ⎢ ⎢ ⎣ q s −q 1
−q 2 −q 3 q 1 q s −q 3 q 2 q 2 q 3 q s −q 1 q 3 −q 2 q 1 q
s ⎤ ⎥ ⎥ ⎦ = [ q s −q T v q v q s I 3 + [q v ] × ] = [ q G T
(q) ] , P(q) = ⎡ ⎢ ⎢ ⎣ q s −q 1 −q 2 −q 3 q 1 q s q 3 −q 2
q 2 −q 3 q s q 1 q 3 q 2 −q 1 q s ⎤ ⎥ ⎥ ⎦ = [ q s −q T v q
v q s I 3 − [q v ] × ] = [ q E T (q) ] , where G(q) = [ −q
v q s I 3 − [q v ] × ] , E(q) = [ −q v q s I 3 + [q v ] × ]
, [x] × = ⎡ ⎣ 0 −x 3 x 2 x 3 0 −x 1 −x 2 x 1 0 ⎤ ⎦ . For
the last definition, this means that [x] × y = x × y. For
completeness, the complex conjugation matrix is C = ⎡ ⎢ ⎢ ⎣
1 0 0 0 0 −1 0 0 0 0 −1 0 0 0 0 −1 ⎤ ⎥ ⎥ ⎦ = [ 1 0 0 −I ] ,
so q † = Cq.
The correspondence to the quaternion algebra is then

Q(p)Q(q) = Q(pq) and P(p)P(q) = P(qp),
as well as Q(q † ) = Q(q) T and P(q † ) = P(q) T .
The two matrices Q(q) and P(q) also commute so that

Q(p)P(q) = P(q)Q(p),
as we can easily verify. This representation makes it easy

to compute the Jacobian
matrices related to quaternion constraints.
We need an expression for q˙q † for unit quaternions 〈〈q〉〉

= 1, which are the
ones corresponding to orthonormal transforms, i.e.,

rotations. That will connect
the changes in the relative quaternions to the angular

velocities of the connected
rigid bodies. Since q † q = 1 always, we have d dt ( q † q

) = 0 = q˙q † + qq˙ † = q˙q † + ( q˙q † ) † ,
and therefore w = 1 2 q˙q † = −w †
is purely imaginary and so w = [ 0 ω ] = P T ω,

where ω ∈ R 3 is the angular velocity expressed in the
inertial frame. So now, we
have q † q˙ = Q(q) T q˙ = 1 2 P T ω = 1 2 G T (q)ω and q˙ †

q = P(q)q˙ † = − 1 2 P T ω = − 1 2 E T (q)ω. (9.7)
These identities are usually summarized as q˙ = 1 2 qw.

(9.8)
Note that the definition in Equation (9.8) is sometimes

written the other way
round, as is the case when defining the angular velocity

vector in the body frame
or when using left-handed multiplication, which is often

used in three-dimensional
graphics. Beware. 9.6 A New Take on Quaternion-Based

Constraints The Jacobians of any quaternion-based
constraint can be computed using just one master Jacobian
matrix and various projections. This is done using the
matrix representation described in Section 9.5. Consider
two rigid bodies with quaternions r, s ∈ H and angular
velocities ω (1) and ω (2) , respectively. The definition
of the relative quaternion is q ∈ H. The first task is to
relate the rate of change of q to the angular velocities ω
(1) and ω (2) . The time derivative of q is q˙ = r˙ † s+ r
† s˙ = (r˙ † r)r † s+ r † s(s † s˙) = (r˙ † r)q + q(s †
s˙). (9.9) So now, using the matrix representation of the
quaternion product, taking the left product using P(q) on
the first term and the right product using Q(q) on the
second, and substituting the identities in Equation (9.7),
we have q˙ = − 1 2 P(q)P T ω (1) + 1 2 Q(q)P T ω (2) = − 1
2 E(q) T ω (1) + 1 2 G(q) T ω (2) . (9.10) The only
Jacobians you need for all three quaternion constraints
defined here are these. It might seem that we took a very
long detour to arrive at Equation (9.10), which is very
simple since we just need matrices E(q) and G(q) in the
end. Looking at the indicators defined above in Equations
(9.2), (9.4), and (9.6), the different Jacobians are simply
different projections of the same proto-Jacobian, namely, G
(1) lock = − 1 2 PE(q) T , G (2) lock = 1 2 PG(q) T , G (1)
hinge = − 1 2 P hinge E(q) T , G (2) hinge = 1 2 P hinge
G(q) T , (9.11) G (1) CV = − 1 2 P CV E(q) T , G (2) CV = 1
2 P CV G(q) T . 9.7 Why It Works The dot product
representation of the indicators for rotational constraints
is as follows: c dlock = ⎡ ⎣ n (1) · u (2) n (1) · v (2) u
(1) · n (2) ⎤ ⎦ = ⎡ ⎣ 0 0 0 ⎤ ⎦ , c dhinge = [ u (1) · n
(2) v (1) · n (2) ] = [ 0 0 ] , c dhooke = u (1) · v (2) =
0.
We use the Hooke joint here for rough comparison since it

is not practical to define
the CV joint with dot products. Now, choose body 2 to be

the universe and rotate
body 1 about u (2) by π so both the new v (2) and n (2)

axes have reversed signs.
Clearly, all three constraints are now violated

geometrically, despite the fact that
the indicator functions are still 0.
This is not the case with the quaternion-based constraints

defined in Equa
tions (9.2) and (9.4) since for a rotation that flips the
axis z by 180 ◦ —q (2) =
[0, 1, 0, 0] T , say—the indicators are then c lock = [1,

0, 0] T and c hinge = [1, 0] T ,
respectively. For the CV joint, the rotation that flips the

axis x corresponds to
q = [cos(π/2), 0, 0, sin(π/2)] T = [0, 0, 0, 1] T , giving

c CV = 1. These are all
maximum violation given that all constraints correspond to

components of unit
quaternions. Thus, the Jacobians at these points are then G

(2) dlock = ⎡ ⎣ 0 0 0 0 0 −1 0 1 0 ⎤ ⎦ , G (2) dhinge = [ 0
0 −1 0 0 0 ] , G (2) dhk = [ 0 0 0 ] ,
respectively, and so the restoration force vanishes at

maximum violation. Since
the Jacobians have full row rank when the constraints are
satisfied, some of the
rows must decrease gradually on the path to maximal

constraint violation and so
the constraint weakens. This problem can be addressed by

adding nonlinear terms
in the constraint definitions. That’s beyond the present
scope, however.
9.8 More General Frames
Of course, we may not always have hinge joints that align

the axis z of body 1
with the axis z of body 2. Changing that is quite easy to

do in the dot product
version, but there are a few additional tricks for the

quaternion counterpart, as I
now show.
Assume now that the body-fixed reference frames in which

the joints are de
fined have quaternions e, f ∈ H, respectively. Figure 9.4

demonstrates the situa
tion for body 1 and transform e. x n (1) u (1) v (1) z y n

(1) u (1) v (1) z → n (1) x → u (1) y → v (1) e Figure 9.4.
Attachment frames. The quaternions that map vectors defined
in these frames to the global frames are then in world
frame, so we have re and sf , respectively. This changes
the definition of the relative quaternion in Equation (9.9)
to p = e † r † sf . Following the steps in Equations (9.9)
and (9.10), we get p˙ = e † q˙f = P(f)Q(e) T q˙. Everything
else follows. To define a hinge joint, for instance, we can
either specify the quaternion transforms e and f directly
or provide a hinge frame containing at least the axis of
rotation in world coordinates. If we have a full frame of
reference for the hinge definition, it is possible to
define the reference joint angle also. Otherwise, the
orthogonal complement of the axis must be computed and the
quaternions e, f extracted from the frame. Once we have a
full frame defining the hinge geometry in world coordinates
with three orthogonal axes, u,v,n, forming an orthonormal
basis in which n is the axis of rotation, we build the
matrix R = [ u v n ]
and extract the quaternion t from it using well-known

techniques [Shoemake 10].
Once you have that, you compute e = r † t and f = s † t,

(9.12)
where r and s are the orientation quaternions of body 1 and

2, respectively.
For the CV joint, the axis of rotation may be different in

each body. For that
case, we need two axes or two frames, as before. A full

frame helps to define
the zero reference, as for the hinge case. The computations

are the same as in
Equation (9.12).
Putting everything together, we can now define the general

constraints and
constraint Jacobians in a unified way using three different

projection operators P
acting on the relative quaternion q. The meta definition is

this: c(x) = Pq, G (1) = − 1 2 PE T (q), G (2) = 1 2 PG T
(q).
In turn, the different constraints have the following

projection operators: P lock = P, P hinge = [ x T y T ]
PP(f)Q(e), P CV = z T PP(f)Q(e). (9.13)
These projection matrices need to be computed only once,

unless we have limits
and drivers, as I explain in the next section.
9.9 Limits and Drivers
The hinge joint leaves one degree of freedom. Good or bad,

even this freedom
is sometimes taken away with joint limits, locks, or

drivers. Going back to the
definitions in Equations (9.4) and (9.3), we can compute

the angle from θ = 2 atan(q s /q 3 ).
This is now a scalar function of the vector argument q, θ =

2f(g(q)), and we can follow the chain rule to get θ˙ = f ′
∇gq˙ and then expand q˙. First, observe that f ′ = q 2 s
/(q 2 s +q 2 3 ) ≈ q 2 s near constraint satisfaction, so
that one is easy. For the rest, we have ∇(q 3 /q s ) = 1 q
2 s [ −q 3 0 0 q s ] . When all is said and done, we have
to add an additional row to the projection operator in
Equation (9.13): P hingec = ⎡ ⎣ 0 1 0 0 0 0 1 0 −q 3 0 0 q
s ⎤ ⎦ P(f)Q(e). The subscript “hingec” now stands for
controlled hinge. The case for the CV joint is similar.
Start from the definition of the polar angle θ = 2 atan(〈〈q
v 〉〉/q s ) using Equation (9.5). The chain rule essentially
provides the same results as before, namely, p T = ∇(〈〈q v
〉〉/q s ) = [ −〈〈q v 〉〉 q s 〈〈q v 〉〉 q T v ] , and so, as in
the case of the hinge constraint, the control part augments
the projection defined in Equation (9.13) to P CVc = [ P CV
p T ] , where P CV = [ 0 0 0 1 ] T , (9.14) as before in
Equation (9.11). And now we are all set to control anything
we like, or almost anything. 9.10 Examples What follows are
simple illustrations of the constraints in action. One
single rigid body is attached to the inertial frame
following the logic explained in the main text, i.e., only
the relative quaternion is of relevance. −4 −3 −2 −1 0
l o
1 0
( 〈〈
( q
) 〉〉
) Hinge Stabilization θ 0 = 0.45 1 2 3 Time quat: 〈〈c〉〉

quat: q 0 dot: 〈〈c〉〉 dot: q 0 −4 −3 −2 −1 0
l o
1 0
( 〈〈
( q
) 〉〉
) Hinge Stabilization θ 0 = 0.55 1 2 3 Time quat: 〈〈c〉〉

quat: q 0 dot: 〈〈c〉〉 dot: q 0
Figure 9.5. The hinge joint defined using either
quaternions (top) or dot constraints
(bottom). −4 −3 −2 −1 0 l o g 1 0 ( 〈〈 g ( q ) 〉〉 ) Hooke
Stabilization θ 0 = 0.45 1 2 3 Time quat: 〈〈c〉〉 quat: q 0
dot: 〈〈c〉〉 dot: q 0 −4 −3 −2 −1 0 l o g 1 0 ( 〈〈 g ( q ) 〉
〉 ) Hooke Stabilization θ 0 = 0.55 1 2 3 Time quat: 〈〈c〉〉
quat: q 0 dot: 〈〈c〉〉 dot: q 0 Figure 9.6. The CV joint is
used here for the quaternion formulation (top), and the
Hooke joint is used for the dot product one (bottom). −4 −3
−2 −1 0
l o
1 0
( 〈〈
( q
) 〉〉
) Lock Stabilization θ 0 = 0.45 1 2 3 Time quat: 〈〈c〉〉

quat: q 0 dot: 〈〈c〉〉 dot: q 0 −4 −3 −2 −1 0
l o
1 0
( 〈〈
( q
) 〉〉
) Lock Stabilization θ 0 = 0.55 1 2 3 Time quat: 〈〈c〉〉

quat: q 0 dot: 〈〈c〉〉 dot: q 0
Figure 9.7. A lock joint simulated using either the

quaternion (top) or the dot product
(bottom) formulation. Starting at nearly 90 ◦ from the
vertical, both constraints relax to
the correct position, in which q s = 1. When the initial

angle is slightly over 90 ◦ , the
quaternion formulation finds its way back to the correct

configuration, but the dot product
version goes the wrong way, stabilizing at the wrong zero

of the indicator. 0 3 6 A n g l e φ 1 2 3 Time Phase
Difference for Quaternion CV Joint Quaternion (CV) angle
Dot product (Hooke) angle Reference Figure 9.8. Constraint
violation and phase difference between input driver and
driven body. This is done for a moderate joint angle of 5 ◦
. Both constraint definitions introduce only a small phase
difference. 0 3 6 A n g l e φ 1 2 3 Time Phase Difference
for Quaternion CV Joint Quaternion (CV) angle Dot product
(Hooke) angle Reference Figure 9.9. Constraint violation
and phase difference between input driver and driven body.
Here, the angle is more pronounced at 20 ◦ . The result is
that the CV joint does still follow the driver with a small
phase difference. The Hooke joint deviates significantly
from the input driver.
0.6
0.8 1
u t
p u
t V
e l
o c
i t y Output velocity at φ = 5, driver at ω = π rads/sec. 1

2 3 4 Time Quaternion, CV Dot product, Hooke
0.6
0.8 1
O
u t
p u
t V
e l
o c
i t y Output velocity at φ = 20, driver at ω = π rads/sec.

1 2 3 4 Time Quaternion, CV Dot product, Hooke
Figure 9.10. These two graphs illustrate more precisely the

ratio of the output angular
velocity to the driver for the quaternion (top) and the dot
product (bottom) formulations.
[Lacoursie`re 07a] Claude Lacoursie`re. “Ghosts and

Machines: Regularized Variational Methods for Interactive
Simulations of Multibodies with Dry Frictional Contacts.”
PhD thesis, Department of Computing Science, Umea˚
University, 2007.
[Lacoursie`re 07b] Claude Lacoursie`re. “Regularized,

Stabilized, Variational Methods for Multibodies.” In The
48th Scandinavian Conference on Simulation and Modeling
(SIMS 2007), 30–31 October, 2007, Go¨teborg (Sa¨ro¨),
Sweden, edited by Peter Bunus, Dag Fritzson, and Claus
Fu¨hrer, pp. 40–48. Linko¨ping: Linko¨ping University
Electronic Press, 2007.
[Masarati and Morandini 08] Pierrangelo Masarati and Marco

Morandini. “An Ideal Homokinetic Joint Formulation for
General-Purpose Multibody RealTime Simulation.” Multibody
Syst Dyn 20 (2008), 251–270.
[Serban and Haug 98] R. Serban and E. J. Haug. “Kinematic

and Kinetic Derivatives in Multibody System Analysis.”
Mechanics Structures Machines 26:2 (1998), 145–173.
[Shoemake 91] Ken Shoemake. “Quaternions and 4 × 4

Matrices.” In Graphics Gem 2, edited by Jim Arvo, pp.
352–354. San Francisco: Morgan Kaufmann, 1991.
[Shoemake 10] Ken Shoemake. “Quaterions.” Unknown.

Available at ftp://ftp.
cis.upenn.edu/pub/graphics/shoemake/quatut.ps.Z, accessed
June 12, 2010.
[Tasora and Righettini 99] Alessandro Tasora and Paolo
Righettini. “Application of the Quaternion Algebra to the
Efficient Computation of Jacobians for Holonomic
Rheonomic-Constraints.” In Proc. of the EUROMECH
Colloquium: Advances in Computational Multibody Dynamics,
edited by Jorge A. C. Ambro´sio and Werner O. Schielen,
IDMEC/IST Euromech Colloquium 404, pp. 75–92. Lisbon:
European Mechanics Society, 1999.
[Wikipedia 10a] Wikipedia. “Constant-Velocity Joint.” 2010.

Wikipedia. Available at
http://en.wikipedia.org/w/index.php?title=Constant-velocity
joint&oldid=351128343, accessed April 27, 2010.
[Wikipedia 10b] Wikipedia. “Universal Joint.” 2010.

Wikipedia. Available at
http://en.wikipedia.org/w/index.php?title=Universal
joint&oldid= 356595941, accessed April 27, 2010.
5 - V - Soft Body
[Etzmuss et al. 03] O. Etzmuss, M. Keckeisen, and W.

Strasser. “A Fast Finite Element Solution for Cloth
Modelling.” In Proceedings of 11th Pacific Conference on
Computer Graphics and Applications, pp. 244–251.
Washington, DC: IEEE Computer Society, 2003.
[Garcia et al. 06] M. Garcia, C. Mendoza, A. Rodriguez, and

L. Pastor. “Optimized Linear FEM for Modeling Deformable
Objects.” Comput. Animat. Virtual Worlds 17: 3–4 (2006),
393–402.
[Muller and Gross 04] M. Muller and M. Gross. “Interactive

Virtual Materials.” In Proceedings of Graphics Interface
2004, pp. 239–246. Waterloo, Ontario: Canadian
Human-Computer Communications Society, 2004.
[Muller and Teschner 03] M. Muller and M. Teschner.

“Volumetric Meshes for Real-Time Medical Simulations.” In
Bildverarbeitung fu¨r die Medizin 2003, CEUR Workshop
Proceedings, 80, pp. 279–283. Aachen, Germany: CEURWS.org,
2003.
[Press et al. 07] William H. Press, Saul A. Teukolsky,

William T. Vetterling, and Brian P. Flannery. Numerical
Recipes: The Art of Sientific Computing. New York:
Cambridge University Press, 2007. [Shewchuk 94] Jonathan R.
Shewchuk. “An Introduction to the Conjugate Gradient Method
Without the Agonizing Pain.” 1994. Available at http://www.
cs.cmu.edu/ ∼ quake-papers/painless-conjugate-gradient.pdf.
[Si 09] Hang Si. “TetGen: A Quality Tetrahedral Mesh
Generator and a 3D Delaunay Triangulator.” 2009. Available
at http://tetgen.berlios.de/. [Spillman et al. 06] J.
Spillman, M. Becker, and M. Teschner. “Robust Tetrahedral
Meshing of Triangle Soups.” In Vision, Modeling, and
Visualizaton 2006, pp. 9–16. Berlin: Akademische
Verlagsgesellschaft, 2006. [Tinney and Walker 67] W. F.
Tinney and J. W. Walker. “Direct Solutions of Sparse
Network Equations by Optimally Ordered Triangular
Factorization.” Proceedings of IEEE 55:11 (1967),
1801–1809. 11 Particle-Based Simulation Using Verlet
Integration Thomas Jakobsen 11.1 Introduction This pearl
explains a technique that I developed in 1999 for the
Hitman game series to simulate falling (and usually very
dead) people, a method of animation now colloquially known
as the ragdoll effect. The algorithm is also useful for
simulating cloth, hair, rigid objects, soft bodies, etc. At
the heart of the algorithm lies the so-called Verlet 1
technique for numerical integration coupled with a
particle-based body representation and the use of
relaxation to solve systems of equations. Together with a
nice square root optimization, the combined method has
several advantages, most notably speed of execution,
stability, and simplicity. While today’s much-faster
hardware platforms allow for more advanced and more
realistic approaches to physics simulation, there are still
situations where a particle-based Verlet approach, like the
one presented here, is preferable, either due to speed of
execution or because of its simplicity. Verlet-based
schemes are especially useful for real-time cloth
simulation, for use on low-spec hardware, for
two-dimensional games, and as an easy introduction to the
world of physically based animation. The mathematics behind
the technique is fairly easy to understand, and once you
reach the limits of the technique, the underlying ideas of
semi-implicit integration and relaxation carry over to more
advanced state representations, constraints, and
interactions. As such, Verlet integration is not only a
good starting point for the beginner, but it also forms the
basis for physics simulation in many existing commercial
games, and it is a good stepping-stone to more advanced
approaches. 1 French, pronounced with a silent t: [veK’le].
251
11.1.1 Background
Hitman: Codename 47 was one of the very first games to

feature articulate rag
dolls, and as such, the physics simulation ran on much

slower hardware than
what is common today. I was assigned the task of developing

the physics sys
tem for Hitman, and I threw myself at the various methods

for physically based
animation that were popular at that time. Most of these,

however, either suffered
from elastical-looking behavior originating from the use of

penalty-based
schemes or they had very bad real-time performance for

various different
reasons.
At some point I remembered the old “demo scene” effect for
simulating rip
ples in water that always had me fascinated. It had all the

nice features I was
looking for, including stability and speed of execution.

Except it simulated water,
neither cloth nor hard nor soft bodies. It relied on a

velocity-less representation
of the system state by using the previous position of the

water surface to update
the current one. What I came up with for Hitman was a

technique that also fea
tures a velocity-less representation of the system state,

yielding a high amount
of stability. As it turned out, almost the same technique

had been used for years
to simulate molecular dynamics (under names such as SHAKE

and RATTLE,
see [Forester and Smith 98]).
I will now continue with a short review of existing methods

for numerical
integration, explaining their differences and drawbacks,

with a focus on semi
implicit methods and Verlet integration. The remainder of

the chapter explains
how to apply the Verlet method to interactive physics

simulation and goes through
some of the related subtleties.
11.2 Techniques for Numerical Integration
For our purposes, the subject of numerical integration

deals with how to advance
a simulation from one time step to the next, updating the

system state by solving
an underlying ordinary differential equation (ODE). An
introduction to numerical
integration has already been given in Chapter 1; please

refer to this for additional
details.
11.2.1 Forward Euler Integration
When experimenting with cloth simulation for the first

time, many developers
choose a basic Euler integration as their initial method

for time stepping a mass
spring system. But we realized pretty quickly that the

technique is far from suf
ficient: cloth tends to vibrate and even “explode” when

moved around too much. The thing is, basic (forward) Euler
integration has a hard time dealing with stiff springs.
This is the major drawback of forward Euler integration,
and often a showstopper. The problem is that particle
positions and velocities come out of sync when time steps
are too large. This in turn leads to instabilities, which
lead to pain and suffering. 11.2.2 Backward Euler
Integration A way to make up for this is to use implicit
integration. The method of backward Euler integration
belongs to the family of implicit-integration methods. The
members of this family all provide more stability in
situations with stiff equations and generally let us use
larger time steps without the risk of the system blowing
up. With backward Euler integration, we update the current
position, not with the current velocity and acceleration
vectors (as was the case with basic Euler integration) but
with the resulting velocity vector v(t + Δt) and the
resulting acceleration vector a(t+Δt). The problem with
this approach, however, is that the acceleration and
velocity at time t + Δt are unknown, and therein lies the
problem with backward Euler integration: as we cannot
directly evaluate the update velocity and acceleration, we
need to solve for the unknowns. The resulting set of
equations can be rather large if there are many particles.
This calls for (usually slow) numerical methods for solving
equations. This means that in their basic forms, neither
backward nor forward Euler integration is immediately
useful for our purpose. 11.2.3 Other Approaches
Experimenting with other approaches, such as adaptive
integration or higher-order integration methods such as
Runge-Kutta, may bring you closer to the desired result,
but these methods, too, are not ideal choices for
real-time, interactive use for the same reasons: they are
basically either slow or unstable. So what does an
intelligent game physics programmer do? It seems we cannot
escape either having to deal with instability or too much
elasticity in the case of explicit integration methods or
being forced to solve unwieldy systems of equations in the
case of implicit integration methods—or, alternatively,
waiting an eternity for adaptive methods to finish.
Luckily, as it turns out, we can have the best of two
worlds. The so-called semi-implicit methods (also known as
semi-explicit methods) are both simple and stable. And
while we may lose some accuracy in some cases, it doesn’t
really matter in the case of game simulation. Who cares if
the dead body flies ten percent too far or too short? We’re
not sending a (real) rocket to the moon. On the other
hand, we do care about visual quality and stability, and we

do care whether our
software runs fast or slow—and semi-implicit methods are

usually fast.
The semi-implicit version of Euler integration goes like

this: v(t + Δt) = v(t) + aΔt, x(t + Δt) = x(t) + v(t +
Δt)Δt.
By substitution, the second equation is equivalent to x(t)

= x(t−Δt)+v(t)Δt,
a fact that together with the above leads us to the Verlet

formulation.
11.2.4 Verlet Integration
As mentioned, Verlet integration is an example of a

semi-implicit integration
method. The Verlet integration update step is just a

reformulation of the expres
sions given in the previous subsection: x(t + Δt) = 2x(t)−

x(t−Δt) + a(t)(Δt) 2 .
Instead of storing the particles’ positions x and

velocities v as before, it suffices
to store the current position x(t) and the previous

position x(t −Δt). Velocities
can then be calculated on the fly from x(t) and x(t−Δt) (if
needed at all): v(t + Δt) = (x(t + Δt)− x(t))/Δt.
This relation means that positions and velocities are

always in sync.
Since x(t)−x(t−Δt) is just the change in position from the

last time frame,
the Verlet formula can be interpreted as follows:
1. Add to the current position the distance we just moved

in the previous time step.
2. Adjust the position to account for gravity and other

forces.
3. The new velocity is directly proportional to the total

step we just moved.
Be aware that because velocity is now given only implicitly

by using the previous
positions of the particles, the time step needs to be kept

constant between each
call to the numerical integrator. While it is possible to

develop formulas that take
changing time steps into account, in my experience, the

best way to handle larger
time steps is to simply call the integrator multiple times.

Velocity Verlet integration. A variant of Verlet
integration that is sometimes used is the velocity Verlet
algorithm (also called Leapfrog integration): x(t + Δt) =
x(t) + v(t)Δt + a(t)(Δt) 2 /2, v(t + Δt/2) = v(t) +
a(t)Δt/2, a(t + Δt) = f(x(t + Δt),v(t + Δt/2)), v(t + Δt) =
v(t + Δt/2) + a(t + Δt)Δt/2, where f is a function of
position and velocity that yields the acceleration given by
the current context. The physics engine in Hitman relies on
basic Verlet integration only, but in situations that call
for higher accuracy or additional robustness, velocity
Verlet integration may sometimes be more suitable. Using
Verlet integration in a physics simulation. It is easy to
implement the results of the above section in a function
that updates a set of unconstrained particles. // Use
Verlet integration to advance an array of particles // t:
Size of time step // x: Array of current positions of
particles // x prev: Array of previous positions of
particles // a: Current acceleration of each particle // n:
Total number of particle coordinates // void
VerletTimeStep(double t, double∗ x, double∗ x prev, double∗
a, int n) { double x old; for(int i=0; i<n; i++) { x old =
∗x; ∗x = 1.99 ∗ ∗x − 0.99 ∗ ∗x prev + ∗a ∗ t ∗ t; ∗x prev =
x old; x++; x prev++; a++; } } The above code has been
written for clarity, not speed. Note that it is possible to
save memory transfers with a double-buffering approach by
alternating between two arrays. Note also that the Verlet
formula has been changed slightly to include the two
factors 1.99 and 0.99 in order to introduce a small amount
of drag in the system for further stabilization.
11.3 Using Relaxation to Solve Systems of Equations
Verlet integration in itself, as described above, provides

a good foundation for,
say, an unconstrained particle system. But how do we go

about handling more
complex restrictions or constraints on the movements of the

particles? How
should interconnected particles be handled, for example?

And how do we keep
particles from penetrating a surface? As for the latter, we

choose to simply project
offending particles out of obstacles. By projection,

loosely speaking, we mean
moving the point as little as possible until it is free of

the obstacle. Normally, this
means moving the point perpendicularly out towards the

collision surface.
11.3.1 Handling Collisions and Penetrations
Let’s look at a simple example. Assume that our world is

the inside of the cube
(0, 0, 0)− (1000, 1000, 1000) and assume furthermore that

the particles’ restitu
tion coefficient is zero (that is, particles do not bounce

off surfaces when collid
ing). To keep all particle positions inside the valid

interval, the corresponding
projection code would be as follows:
// Keeps particles in a box
void SatisfyBoxConstraints(double∗ x, int n) {
for(int i=0; i<n; i++) { // For all particle coordinates ∗x

= min(max(∗x, 0.0), 1000.0); x++;
This keeps all particle positions inside the cube and

handles both collisions and
resting contact. The beauty of the Verlet integration

scheme is that the cor
responding changes in velocity are handled automatically.

Thus, after calling
SatisfyBoxConstraints() and VerletTimeStep() a number of

times,
the velocity vector will contain no components in the

normal direction of the sur
face (corresponding to a restitution coefficient of zero).

The update loop is then:
void UpdateLoop() {
VerletTimeStep();
SatisfyBoxConstraints();
Try it out—there is no need to directly cancel the velocity

in the normal direc
tion. While the above might seem somewhat trivial when

looking at particles, the
strength of the Verlet integration scheme is now beginning
to shine through and should really become apparent when
introducing constraints and coupled rigid bodies in a
moment. 11.3.2 Handling Constraints We now describe by
example how more complex constraints can be implemented.
Assume that we have two particles that we wish to keep at a
fixed distance from each other, in effect simulating a
stick. Just as in the above case, where collisions were
handled by projecting the particles in question out of the
offending obstacles, we carry out a similar procedure here:
if a particle invalidates a constraint after the Verlet
time step routine has been called, we simply move the
particle by as little as possible in order to satisfy the
constraint once again. In the case of the stick this means
pulling or pushing the particles directly towards or away
from each other (depending on whether their distance is too
large or too small; see Figure 11.1). For each pair of
constrained particle positions x ∗ i and x ∗ j , the
following calculations must be carried out: d = x j − x i ,
(11.1) u = ( r ||d|| − 1.0 ) d, (11.2) x ∗ i = x i − 1 2 u,
(11.3) x ∗ j = x j + 1 2 u, (11.4) where r is the rest
length of the stick and u is the missing displacement
between the two particles. Assume now that we also want the
particles to satisfy the cube constraints discussed in the
previous subsection. By running the above code to fix the
stick Distance too large Correct distance Distance too
small Figure 11.1. Moving the particles to fix an invalid
distance.
constraint, however, we may have invalidated one or more of

the cube constraints
by pushing a particle out of the cube. This situation can

be remedied by immedi
ately projecting the offending particle’s position back

onto the cube surface once
more—but then we end up invalidating the stick constraint

once again.
Really, what we should do is solve for all constraints at

once, both the box and
the stick constraints. This would be a matter of solving a

system of equations. But
instead of explicitly forming the system and solving it

with a separate algorithm
for solving systems of equations, we choose to do it
indirectly by local iteration.
We simply repeat the two pieces of code a number of times

after each other in the
hope that the result is useful. This yields the following

code:
void TimeStep StickInBox()
VerletTimeStep();
while(notConverged) { SatisfyBoxConstraints()
SatisfyStickConstraints()
While this approach of pure repetition might appear

somewhat naive, it turns out
that it actually converges to the solution that we are

looking for! The method is
called relaxation (or Jacobi or Gauss-Seidel iteration

depending on how you do it
exactly, see [Press et al. 92]). It works by consecutively

satisfying various local
constraints and then repeating; if the conditions are

right, this will converge to
a global configuration that satisfies all constraints at

the same time. It is useful
in many other situations where several interdependent

constraints must hold si
multaneously. As a general algorithm for solving equations,

the method doesn’t
converge as fast as other approaches do, but for

interactive physics simulation it
is often an excellent choice.

We get the following overall simulation algorithm (in
pseudocode):
void TimeStep()
VerletTimeStep();
// Relaxation step
iterate until convergence { for each constraint (incl.

collisions) { satisfy constraint }
} The number of necessary iterations varies depending on

the physical system simulated and the amount of motion. The
relaxation can be made adaptive by measuring the change
from the last iteration. If we stop the iterations early,
the result might not end up being quite valid but because
of the Verlet scheme, in the next frame it will probably be
better, the next frame even more so, etc. This means that
stopping early will not ruin everything, although the
resulting animation might appear somewhat sloppier. 11.3.3
Cloth Simulation The fact that a stick constraint can be
thought of as a really hard spring should underline its
usefulness for cloth simulation. Assume, for example, that
a hexagonal mesh of triangles describing the cloth has been
constructed. For each vertex a particle is created, and for
each edge a stick constraint between the two corresponding
particles is initialized (with the constraint’s “rest
length” simply being the initial distance between the two
vertices). To solve for these constraints, we use
relaxation as described above. The relaxation loop could be
iterated several times. However, to obtain nice-looking
animations for most pieces of cloth, only one iteration is
necessary! This means that the time usage in the cloth
simulation depends mostly on the N square root operations
and the N divisions performed (where N denotes the number
of edges in the cloth mesh). As we shall see, a clever
trick makes it possible to reduce this to just N divisions
per frame update—this is really fast, and some might argue
that it probably can’t get much faster. Optimizing away the
square root. We now discuss how to get rid of the square
root operation. If the constraints are all satisfied (which
they should be, at least almost), we already know what the
result of the square root operation in a particular
constraint expression ought to be, namely, the rest length
r of the corresponding stick. We can use this fact to
approximate the square root function. Mathematically, what
we do is approximate the square root function by its
first-order Taylor expansion at a neighborhood of the
squared rest length r 2 (this is equivalent to one
Newton-Raphson iteration with initial guess r). A
real-valued function f may be approximated around a
neighborhood a by using its Taylor series: f(x) = f(a) + f
′ (a) 1! (x − a) + f ′′ (a) 2! (x− a) 2 + . . . . In the
case of the square root function f(x) = √ x around a = r 2
, we get the following: √ x ≈ f(r 2 ) + f ′ (r 2 )(x − r 2
) = √ r 2 + 1 2 √ r 2 (x− r 2 ) = r 2 + x 2r .
As expected, for x = r 2 , we get √ x ≈ r. Using the above

approximation to
rewrite Equations (11.1)–(11.4), we end up with the

following pseudocode:
// Pseudo−code for satisfying a stick constraint
// using sqrt approximation
d = x2 − x1; // OBS: vector operation; d, x1 and x2 are

vectors
d ∗= r ∗ r / (dotprod(d, d) + r ∗ r) − 0.5;
x1 += d;
x2 −= d;
Notice that if the distance is already correct (that is, if

||x 2 − x 1 || = r), then we
get d = (0, 0, 0), and no change is going to happen.
Per constraint we now use zero square roots, one division

only, and the squared
value r 2 can even be precalculated! The usage of

time-consuming operations is
now down to N divisions per frame (and the corresponding

memory accesses)—it
can’t be done much faster than that, and the result even
looks quite nice. The con
straints are not guaranteed to be satisfied after one

iteration only, but because of
the Verlet integration scheme, the system will quickly

converge to the correct state
over some frames. In fact, using only one iteration and

approximating the square
root removes the stiffness that appears otherwise when the

sticks are perfectly
stiff.
By placing support sticks between strategically chosen

couples of vertices
sharing a neighbor, the cloth algorithm can be extended to

simulate bending ob
jects, such as plants. Again, in Hitman, only one pass

through the relaxation loop
was enough (in fact, the low number gave the plants exactly
the right amount of
bending behavior).
The code and the equations covered in this section assume

that all particles
have identical mass. Of course, it is possible to model

particles with different
masses; the equations only get a little more complex. To

satisfy constraints while
respecting particle masses, use the following code:
// Pseudo−code to satisfy a stick \index{stick

constraint}constraint with particle masses
d = x2 − x1;
dlen = sqrt(dotprod(d,d));
f = (dlen − r) / (dl ∗ (invmass1 + invmass2));
x1 += invmass1 ∗ d ∗ f;
x2 −= invmass2 ∗ d ∗ f; Here, invmass1 and invmass2 are the

numerical inverses of the two masses. If we want a particle
to be immovable, simply set invmass= 0 for that particle
(corresponding to an infinite mass). Of course, in the
above case, the square root can also be approximated for a
speed-up. 11.4 Rigid Bodies The equations governing motion
of rigid bodies were discovered long before the invention
of modern computers. To be able to say anything useful at
that time, mathematicians needed the ability to manipulate
expressions symbolically. In the theory of rigid bodies,
this led to useful notions and tools such as inertia
tensors, angular momentum, torque, quaternions for
representing orientations, etc. However, with the current
ability to process huge amounts of data numerically, it has
become feasible and in some cases even advantageous to
break down calculations to simpler elements when running a
simulation. In the case of three-dimensional rigid bodies,
this could mean modeling a rigid body by four particles and
six constraints (giving the correct amount of degrees of
freedom, 4 × 3 − 6 = 6). This simplifies many things.
Consider a tetrahedron and place a particle at each of its
four vertices. In addition, for each of the tetrahedron’s
six edges, create a distance constraint like the stick
constraint discussed in the previous section. This
configuration suffices to simulate a rigid body. The
tetrahedron can be let loose inside the cube world from
earlier, and the Verlet integrator will then move it
correctly. The function SatisfyConstraints() should take
care of two things: (1) that particles are kept inside the
cube (like previously) and (2) that the six distance
constraints are satisfied. Again, this can be done using
the relaxation approach; three or four iterations should be
enough with optional square root approximation. Inside the
cube world, collisions are handled simply by moving
offending particles (those placed at the tetrahedron
vertices) such that they do not intersect with obstacles.
In a more complex setting than the cube world, however, the
sides of the tetrahedron may also intersect with obstacles
without the particles at the vertices themselves being in
invalid positions (see Figure 11.2). In this case, the
vertex particles of the tetrahedron, which describe the
position of the rigid body, must be moved proportionally to
how near they are to the actual point of collision. If, for
example, a collision occurs exactly halfway between
particles x 1 and x 2 , then both these particles should
both be moved by the same amount along the collision
surface normal until the collision point (which is halfway
between the two particles) has been moved out of the
obstacle (see Figures 11.3 and 11.4). Figure 11.2.
Tetrahedron (triangle) intersecting the world geometry.
p=x1 x2 q x2 q x1 p Figure 11.3. Stick intersecting the
world geometry in two different ways.
In an analogous way, collisions that take place on a face

of the tetrahedrons
or even inside the tetrahedron will require moving three or

all four particles to fix
the penetration. Let p be the penetration point on the

tetrahedron and q be the one
on the obstacle. To handle any type of collision, follow

the procedure described
below.
First, express p as a linear combination of the four

particles that make up the
tetrahedron: p = c 1 x 1 + c 2 x 2 + c 3 x 3 + c 4 x 4 such
that the weights sum to one:
c 1 + c 2 + c 3 + c 4 = 1 (this calls for solving a small

system of linear equations).
After finding d = q− p, compute the value λ = 1 c 2 1 + c 2

2 + c 2 3 + c 2 4
(λ is a so-called Lagrange multiplier). The new particle

positions are then given x2 p=q=x1 x2 x1 p=q Figure 11.4.
Resolved stick collisions. by x ∗ 1 = x 1 + c 1 λd, x ∗ 2 =
x 2 + c 2 λd, x ∗ 3 = x 3 + c 3 λd, x ∗ 4 = x 4 + c 4 λd.
The new position of the tetrahedron’s penetration point p ∗
= c 1 x ∗ 1 + c 2 x ∗ 2 + c 3 x ∗ 3 + c 4 x ∗ 4 will
coincide with q. For details on the derivation of the above
equations, see [Jakobsen 01]. The above equations can also
be used to embed the tetrahedron inside another shape,
which is then used for collision purposes. In this case, p
will be a point on the surface of this shape (See Figure
11.5). Figure 11.5. Tetrahedron (triangle) embedded in
arbitrary object geometry touching the world geometry. In
the above case, the rigid body collided with an immovable
world, but the method generalizes to handle collisions of
several (movable) rigid bodies. The collisions are
processed for one pair of bodies at a time. Instead of
moving only p, in this case, both p and q should be moved
towards one another.
In the relaxation loop, just like earlier, after adjusting

the particle positions
such that nonpenetration constraints are satisfied, the six
distance constraints that
make up the rigid body should be taken care of (since they

may have been inval
idated by the process), and the whole procedure is then

iterated. Three to four
relaxation iterations are usually enough. The bodies will

not behave as if they
were completely rigid since the relaxation iterations are

stopped prematurely, but
this is mostly a nice feature, actually, as there is no

such thing as perfectly rigid
bodies—especially not human bodies. It also makes the

system more stable.
By rearranging the positions and masses of the particles

that make up the
tetrahedron, the physical properties can be changed

accordingly (mathematically,
the inertia tensor changes as the positions and masses of

the particles are altered).
11.5 Articulated Bodies
It is possible to connect multiple rigid bodies by hinges,

pin joints, and so on.
Simply let two rigid bodies share a particle, and they will
be connected by a pin
joint. Share two particles, and they are connected by a

hinge (see Figure 11.6).
It is also possible to connect two rigid bodies by a stick

constraint or any
other kind of constraint—in order to do so, one simply adds

the corresponding
constraint-handling code to the relaxation loop.

This approach makes it possible to construct a complete
model of an articu
lated human body. For additional realism, various angular

constraints will have
to be implemented as well. There are different ways to

accomplish this. A sim
ple way is to use stick constraints that are enforced only

if the distance between
two particles falls below some threshold (mathematically,

we have a unilateral
[inequality] distance constraint, ||x 2 − x 1 || > 100). As

a direct result, the two
particles will never come too close to each other (see

Figure 11.7).
Particles can also be restricted to move, for example, in

certain planes only.
Once again, particles with positions not satisfying the

above-mentioned constraints
should be moved—deciding exactly how is slightly more

complicated than with
the stick constraints. Figure 11.6. Pin joint and hinge

joint using particles and sticks. x1 x2 x0 Figure 11.7. Two
stick constraints and an inequality constraint (dotted)
modeling, e.g., an arm. Actually, in Hitman, corpses aren’t
composed of rigid bodies modeled by tetrahedrons. They are
simpler yet, as they consist of particles connected by
stick constraints, in effect forming stick figures (see
Figure 11.8). The position and orientation of each limb (a
vector and a matrix) are then derived for rendering
purposes from the particle positions using various
cross-products and vector normalizations (making certain
that knees and elbows bend naturally). Figure 11.8. Ragdoll
model using particles and sticks (used in Hitman: Codename
47).
In other words, seen isolated, each limb is not a rigid

body with the usual six
degrees of freedom. This means that the physics of rotation

around the length axis
of a limb is not simulated. Instead, the skeletal animation
system used to set up
the polygonal mesh of the character is forced to orient the

leg, for instance, such
that the knee appears to bend naturally. Since rotation of

legs and arms around
the length axis does not comprise the essential motion of a

falling human body,
this works out okay and actually optimizes speed by a great

deal.
Angular constraints are implemented to enforce limitations

of the human
anatomy. Simple self-collision is taken care of by

strategically introducing in
equality distance constraints as discussed above, for

example, between the two
knees—making sure that the legs never cross.
For collision with the environment, which consists of

triangles, each stick is
modeled as a capped cylinder. Somewhere in the collision

system, a subroutine
handles collisions between capped cylinders and triangles.

When a collision is
found, the penetration depth and points are extracted, and

the collision is then
handled for the offending stick in question exactly as

described earlier. Naturally,
a lot of additional tweaking was necessary to get the

result just right.
11.6 Miscellaneous
11.6.1 Motion Control
To influence the motion of a simulated object, we simply

move the particles cor
respondingly. If a person is hit in the shoulder, move the

shoulder particle back
wards over a distance proportional to the strength of the

blow. The Verlet integra
tor will then automatically set the shoulder in motion.
This also makes it easy for the simulation to “inherit”

velocities from an un
derlying traditional animation system. Simply record the

positions of the particles
for two frames and then give them to the Verlet integrator,
which then automati
cally continues the motion. Bombs can be implemented by

pushing each particle
in the system away from the explosion over a distance

inversely proportional to
the squared distance between the particle and the bomb

center.
It is possible to constrain a specific limb, say the hand,

to a fixed position in
space. In this way, we can implement inverse kinematics

(IK): inside the relax
ation loop, keep setting the position of a specific

particle (or several particles) to
the position(s) wanted. Giving the particle infinite mass

(invmass=0) helps make
it immovable to the physics system. In Hitman, this

strategy is used when drag
ging corpses; the hand (or neck or foot) of the corpse is

constrained to follow the
hand of the player. 11.6.2 Friction Friction has not been

taken care of yet. This means that unless we do something
more, particles will slide along the floor as if it were
made of ice. According to the Coulomb friction model,
friction force depends on the size of the normal force
between the objects in contact. To implement this, we
measure the penetration depth d p when a penetration has
occurred (before projecting the penetration point out of
the obstacle). After projecting the particle onto the
surface, the tangential velocity v t is then reduced by an
amount proportional to d p (the proportion factor being the
friction constant). This is done by appropriately modifying
x(t − Δt) (see Figure 11.9). Care should be taken that the
tangential velocity does not reverse its direction—in this
case, it should simply be set to zero since this indicates
that the penetration point has ceased to move tangentially.
v t d v p t Figure 11.9. Collision handling with friction.
11.6.3 Collision Response To prevent objects that are
moving really fast from passing through other obstacles
(because of too-large time steps), a simple test is
performed. Imagine the line (or a capped cylinder of proper
radius) beginning at the position of the object’s midpoint
last frame and ending at the position of the object’s
midpoint at the current frame. If this line hits anything,
then the object position is set to the point of collision.
Though this can theoretically give problems, in practice it
works fine. Another collision “cheat” was used for dead
bodies. If the unusual thing happens that a fast-moving
limb ends up being placed with the ends of the capped
cylinder on each side of a wall, the cylinder is projected
to the side of the wall where the cylinder is connected to
the torso. 11.6.4 Relaxation The number of relaxation
iterations used in Hitman varies between one and ten with
the kind of object simulated. Although this is not enough
to accurately solve
the global system of constraints, it is sufficient to make

motion seem natural. The
nice thing about this scheme is that inaccuracies do not

accumulate or persist visu
ally in the system causing object drift or the like—in some

sense, the combination
of projection and the Verlet scheme manages to distribute

complex calculations
over several frames. Fortunately, the inaccuracies are

smallest or even nonexistent
when there is little motion and greatest when there is

heavy motion—this is nice
since fast or complex motion somewhat masks small
inaccuracies for the human
eye.
A kind of soft body can also be implemented by using “soft”

constraints, i.e.,
constraints that are allowed to have only a certain

percentage of the deviation “re
paired” each frame (i.e., if the rest length of a stick

between two particles is 100
but the actual distance is 60, the relaxation code could

first set the distance to
80 instead of 100, next frame to 90, then 95, 97.5, etc.).

Varying this relaxation
coefficient may in fact be necessary in certain situations

to enable convergence.
Similarly, over-relaxation (using a coefficient larger than

one) may also success
fully speed up convergence, but take care not to overdo

this, especially if the
number of iterations is low, as it may cause instabilities.
Singularities (divisions by zero usually brought about by

coinciding particles)
can be handled by slightly dislocating particles at random.
11.6.5 Extending the Verlet Approach
There are several ways to extend the Verlet approach to

allow for more advanced
representations and features. For one thing, it is possible

to represent rigid bodies
by quaternions and use inertial tensors to better model the

properties of objects.
The main idea of the Verlet integration of using the

previous positions instead of
velocities carries over, only the equations get a bit more
complex.
Constraints that are more general than the stick constraint

may be imple
mented by computing appropriate constraint Jacobians,

finding Lagrange mul
tipliers, etc.
Instead of using relaxation to solve for constraints, it is

possible to use more
precise algorithms for solving systems of equations, such

as conjugate gradient
methods or Newton methods, but this is outside the scope of

this chapter.
11.7 Conclusion
This pearl has described how a physics system was

implemented in Hitman: Co
dename 47 running on a low-spec platform. The underlying

philosophy of com
12.2.2 Constraint Solver
The threads of the cloth are modeled as distance

constraints. An individual con
straint strives to maintain a constant distance between two

particles. A particle
typically has more than one constraint attached to it. This

network of constraints
is solved with a relaxation solver.
A relaxation solver simply solves each individual

constraint independently
of the other constraints in the system. Solving one

constraint will potentially
violate the other connected constraints. However, each time

we iterate over all
the constraints in the system, the overall global error is
reduced. Given enough
time, the system converges to a solution.
To solve an individual constraint, we directly update the

positions of the at
tached particles [Provot 95].
Vector3 pa = constraint.m particleA.currentPosition;
Vector3 pb = constraint.m particleB.currentPosition;
float targetDistance = constraint.m restingDistance;
Vector3 dp = pa−pb;
distance = dp.length();
float derr = (distance − targetDistance)/distance;
pa += dp∗0.5∗derr;
pb −= dp∗0.5∗derr;
Often, the rate of convergence for a relaxation solver can

be improved slightly
by using a technique called over-relaxation. With

over-relaxation, we simply
overshoot our target by a percentage of the existing error.

This technique can
cause unwelcome artifacts, so use with caution. In the

context of character cloth,
I have found that a value of 1.15 allows us to perform 10%

fewer iterations while
remaining artifact free. This makes some intuitive sense.

Since the cloth tends
to have more stretching along the longer noncyclical paths

during the course of
a simulation, over-shooting helps accelerate the global

shrinking in those direc
tions, i.e., hanging capes or shirts have their bottoms
pulled up quicker.
float relaxationFactor = 1.15;
pa += dp∗0.5∗derr∗relaxationFactor;
pb −= dp∗0.5∗derr∗relaxationFactor; 12.3 Modeling Real

Fabrics Unmodified, the simulation technique outlined so
far produces clothing that looks like a light rubbery silk.
Fashionistas typically turn up their noses at such attire,
while gamers dream of the comfort such clothing would
bestow upon the wearer. Gamers desire to play neither a
comfortable gamer nor a fashionista during their gaming
sessions. Therefore, this fabric is irrelevant, and we must
try to improve the visual appeal. The application of
internal damping helps make the cloth look like it is made
of a more natural material. This is done by projecting the
particle velocities on the distance constraints. For the
best effect, it can be applied every iteration. Vector3
paPrev = constraint.m particleA.previousPosition; Vector3
pbPrev = constraint.m particleB.previousPosition; float
dampingFactor = 0.3f; Vector3 va = pa − paPrev; Vector3 vb
= pb − pbPrev; Vector3 vab = va − vb; Vector3 v = vab.dot(
dp ); float damping = v∗dampingFactor; pa +=
dp∗0.5∗damping; pb −= dp∗0.5∗damping; There is a
performance cost here, but the improvement to the visual
quality of the material is significant. Real fabrics buckle
much more easily in comparison to their resistance to
stretching. Ideally, this would be modeled by using a very
high-resolution set of particles. Even then a stiff
buckling resistance will be present, although at a higher
frequency and less noticeable scale. An alternative is to
weaken the constraints’ resistance to compression up to a
certain limit. This also helps alleviate the jagged
bunching and jittering of cloth that can occur at character
joints. Visually we lose some creasing and folding, but the
motion looks more convincing. As an example, around the
shoulder joint of a character, we will most likely see
popping and jagged cloth mesh artifacts. To fix this
problem, we can tune the constraints in this area to not
respond to compression:
float derr = (distance − targetDistance)/distance;
derr = ( derr < 0 ) ? 0.0f : derr;
This technique of modeling cloth, and indeed most known

cloth simulators, tends
to smooth out smaller wrinkles [Bridson et al. 03]. The
wrinkles are the most
noticeable feature of cloth, since they form dark shadowed

valleys against peaks
that catch much of the light. We can add wrinkles back in

as a rendering effect by
using wrinkle maps driven by compression values.
Friction forces are needed to model the contact between

cloth and skin in a
believable manner. The most basic and performant friction

model is to modify
the effective velocity of a particle when it experiences a

collision. We do so
by moving the previous particle position towards the

current one by the velocity
scaled by the friction coefficient:
Vector3 v = p − pPrevious;
v = v − normal∗dot(v, normal);
pPrevious += v∗mu;
Friction between cloth and skin is a fairly complicated

interaction. We could
make the friction strength depend on the depth of the

collision. This is only a
rough approximation of the contact force, and given the

complicated nature of the
situation, we can choose to leave it out. Another choice is

when to apply friction.
Applying friction with every collision is an option, or

only applying it once, either
at the start or the end of the solver loop. It is best to

experiment to find the right
look for each simulation.

12.3.1 Character Cloth Constraint
Attaching simulated cloth to an animated character requires

a special type of con
straint. A character bone may rotate and translate very

large distances in a single
frame. Keeping the cloth on the correct side of a bone’s

collision geometry is a
challenge.
The simplest constraint that will keep a particle from

passing through collision
geometry is to skin it rigidly to the bone. This isn’t a

very interesting way of doing
things. We’ll call this the pinning constraint, or just

pinning. If we have pinning
that makes sure a given particle can never move more than
halfway through the
collision geometry, then, providing the geometry is convex,

the collision response
will push the particle out to the side it came from. This
can be done with a
unilateral distance constraint between the particle and an

anchor position. The anchor positions used are the skinned
positions for the particle on the character rig. These data
should easily be made available, since most game engines
will already have a skinning system in place. As a bonus,
bone weightings should be authored so the anchor points are
in natural locations. This is what would be used for the
cloth verts, if there was no simulation. It is useful to
have a hard, immovable constraint where it is not possible
to move the cloth particle. Essentially, we don’t simulate
this particle at all so it doesn’t belong with the list of
simulated particles, but it will exist as the member of a
constraint. A nice way to implement this is to move all
those hard-pinned particles to the end of the particle list
and then terminate any particle update loops early. During
constraint updates, we don’t want to update the position of
any hard-pinned vert. We can vary the pinning strength. The
pinning strength is a value we use to apply only a portion
of the constraint correction. With a value of 1.0, we would
move the particle all the way back to its anchor position.
Applying a pinning strength that is proportional to the
distance from the skinned position helps make it less
apparent that there is a hard distance constraint being
applied. Such a distance-proportional pinning strength can
be applied before and after a set pinning radius. This
gives a good deal of control. The pinning function now
appears as in Figure 12.1. As long as the pinning strength
hits the maximum value of 1.0 before it moves over half the
radius of the collision geometry, we can be confident it is
doing its job. Since the pinning strength reduces the
constraint error by a proportional amount each iteration,
the effective strength is much more pronounced Figure 12.1.
Pinning function.
than a linear effect. So, if we want a subtle effect, we

need to use quite a small
value for the pinning strength.
It is important to have a flexible pinning function because

different sections
of a piece of clothing require different pinning values.

The bottom of a shirt can
move large distances, while the areas under an armpit need

tighter control. Arms
of a shirt are especially tricky to tune because we want

both dramatic simulation
and control. The radius for the collision geometry

representing arms is relatively
small. What works well in practice for maximum visual

effect is to have a pinning
strength of 0 and a radius of under half of the bone’s

collision radius. Then apply
a distance-proportional pinning strength after the radius

has been exceeded. This
softens the constraint, while providing good control. An

easy-to-use interface that
allows the character team to paint pinning values on a

cloth mesh is a very useful
thing.
12.3.2 Collisions
Spheres and capsules are easy-to-use collision geometries

and are a fair represen
tation of character limbs. To respond to collisions, we

simply push the position of
any interpenetrating particle via the shortest path to the

surface. This path points
along the vector formed from the position of the particle

and the center of the
collision object.
For a high-resolution cloth mesh, the torso of a

character’s body is too com
plicated to model with spheres and capsules. Unless we are

using a very large
number of capsules and spheres, the way the cloth rests on

the character will be
tray the underlying geometric approximations we used. A

triangle mesh can yield
good performance by utilizing a caching optimization. Each

particle should keep
track of which triangle it collided with in the last frame.

Check to see whether a
particle is within the edge boundaries of its cached

triangle (the triangle’s extruded
wedge). If so, collide with that triangle. If not, use an

edge-walking algorithm
to find the new triangle whose extruded wedge contains the

particle. Typically,
the particle will be in the bounds of its cached triangle

or have moved to a di
rectly neighboring triangle. Performance is actually quite

good. For best results,
the mesh should be closed and convex. Responding to the

collision is simply a
matter of pushing the particle’s position out to the

surface of the triangle along its
normal.
12.4 Performance
By far, the most expensive part of the simulation is the

collision detection. Since
the constraints directly and immediately update the

positions of the particles, we need to perform collision
checks, if not for every iteration of the relaxation solver
then for every two or three iterations. A final collision
pass should be done after all other constraints. This is
required to avoid having any geometry lying under the cloth
show through. Determining which collision objects need to
collide against which particles can be expensive. To
ameliorate this problem, we can group particles with
specific collision objects, e.g., the left sleeve only
needs to collide with the left arm. Modern consoles are
very sensitive to memory access and cache performance.
Avoiding the load-hit-store [Heineman 08] is important.
This can be done by ordering the list of constraints so
that those that update a common particle are spaced apart.
By spacing them apart, a particle’s write will hopefully be
completed before its data are needed by the next constraint
that uses it. At least we will reduce the time the next
constraint must wait. Figure 12.2. Cloth update: wrong
ordering.
12.5 Order of Cloth Update Stages
Ordering what parts of the simulation happen when is

critically important for
minimizing simulation artifacts. When coding, good

engineering practices say
that we should not have direct coupling between the

character animation system
and the cloth simulator. The skinning data represent a

significant amount of data
to hold on to for any length of time. We will want to use
it immediately after we
have calculated it. Since the skinning data are updated by

the character animation
system and we have efficiency in mind, the natural thinking

is to apply the pinning
constraint as the very first thing we do in our update.

This is the wrong order.
Looking at Figure 12.2 shows why. This figure shows a

particle anchored between
two collision spheres. The spheres translate a large

distance in the first frame.
This is not a configuration we would see in practice, but

it serves our instructional
purposes here. During the render phase (right after the

frame boundaries), we
are able to see the cloth particle on the wrong side of a

collision body. A better
[Heineman 08] Becky Heineman. “Sponsored Feature: Common

Performance Issues in Game Programming.” Gamasutra.
Available at http://www
.gamasutra.com/view/feature/3687/sponsored feature common
.php, 2008.
[Jakobsen 03] Thomas Jakobsen. “Advanced Character

Physics.”, 2003. Gamasutra. Available at
http://www.gamasutra.com/resource guide/20030121/ jacobson
01.shtml.
[Provot 95] Xavier Provot. “Deformation Constraints in a

Mass-Spring Model to Describe Rigid Cloth Behavior.” In
Graphics Interface ’95, pp. 147–155. Quebec: Graphics
Interface, 1995.
6 - VI - Skinning
the animation of a bone model (for an overview in this

field, see, e.g., [Jacka
et al. 07]). These techniques work very well in practice,

even for challenging
regions such as shoulders or heels. They are of a purely

kinematic nature and
there is no time dependence, so it does not matter if a

limb is moved slowly or
quickly—the calculated surface vertices are the same.
On the other hand, physics-based simulation has entered

computer games,
for example, in form of the simulation of ragdolls (a

collection of multiple rigid
bodies, where each of the bodies is linked to a bone of the

skeletal animation
system). A famous example is the game Hitman: Codename 47

by IO Interactive
[Jakobsen 01]. Such simulations can be used to model cloth,

plants, waving flags,
or dying characters.
More advanced physics simulations quickly become

computationally inten
sive and thus not suitable for real-time processing. This

is a pity because there are
a lot of physical effects that get completely lost even in

ragdoll physics—effects
that would be stunning if achieved in a real-time

simulation. It would be great
to realistically simulate the properties of solid

materials—watch how they react
and deform when applying pressure to the surface—or when

under the influence
of gravity. Or concerning character animation: animating a
character in its low
frequency motion using its bone model, defining some

material properties, and
letting the physics system take care of the small and

high-frequency motion—
think of the jiggling of fat tissue when an ogre starts to

move.
It is this tiny motion that adds most to the realism in a

simulation.
Of course, this animation system would have to take care of

maintaining sur
face details, such as the layout of veins on an arm or the

wrinkles on an old man’s
face.
In computer games, performance is very important, and only

a small percent
age of computation time can be spent on the physics

subsystem, but more and
more realistic simulations can enter our homes as

processors get faster and graph
ics hardware more programmable.
In this chapter, a physics simulation is developed that can

add secondary
deformation to a mesh, while the primary deformation can

still be driven by a
skeleton—the comfort of animating a character by some

simple bones will be
preserved.
One thing that we have to bear in mind is that for

simulation in computer
games, we are not ultimately striving for accuracy, as we

would in a scientific
simulation—but rather, we strive for believability—the

programmer is in the po
sition to trick the player into thinking that what the

player sees is real.
Such a simulation can dramatically improve the realism of

an animation and
still be economic in computational effort. In fact, the

techniques presented in this chapter take much less
computation time than the collision-handling routines that
are needed to have different models and geometry applying
forces on each other. The collision handling of deformable
bodies has to be more sophisticated than that of rigid
bodies, since there is always a certain penetration depth
when two deformable objects collide and deform each other,
and there is always the possibility of self-penetration
(see [Teschner et al. 05] for detailed information). For
simulating the effect of surface connectivity, a technique
called “shape matching” [Mu¨ller et al. 05] is used, which
takes care of maintaining surface details during the
simulation. Several approaches to addressing volumetric
effects of a solid material and its applicability are
discussed, and the best-fitting technique is used. If the
lack of realism of these techniques is not acceptable, the
method presented in Chapter 10 is a much better approach to
simulating deformable objects, since it is completely based
on a physically correct description. Throughout this
chapter, we are seeking for drop-in solutions that can
easily be integrated into an existing simulation. Objects
in current computer games are surface mesh, so our
deformable simulation should be surface based while still
being able to naturally simulate volumetric behavior. This
way, the deformation model can efficiently be integrated
into the rendering pipeline, and the computations can even
be done on the graphics hardware. Because of the
simplifications made, the simulation will rely on material
properties that need to be tuned by a designer during
content creation to become realistic. In this chapter, all
the background necessary to understand what is going on in
principle is covered, while always focusing on
practicability. We will work on an implementation of a
deformable mesh simulation that will gradually be extended
and can be modified to suit any special purpose in a game.
Section 14.2 will introduce the force model used for the
simulation and points out potential pitfalls. Section 14.3
incorporates the effect of surface connectivity in a
polygonal mesh in an economical way. The shape-matching
algorithm is described. The following section, Section
14.4, accounts for the influences of the volumetric effects
on a solid material without an accurate physical simulation
of the interior of the mesh. 14.2 The Interaction Model The
starting point of the simulation is a triangle mesh of
vertices with positions x 0 i . It can be animated in time
(e.g., using keyframes), but it does not have to,
for now. We want to enrich this static model with

physically motivated motion.
There are quite a lot of forces that can be taken into

account, so the focus should
lie on forces that add most to the felt realism of a

simulation. The question then
is how to construct them in a computationally economic way.
In the end, the sum of all acting forces is the change in

velocity for each vertex
at a given time step.
The first extension we make to this basic animated mesh

model is to call the
vertex positions of the animated mesh model the “rest”

positions (x 0 i ) and give
our actual positions (x i ) the freedom to vary from those.

They will get a mass to
define how they will react on a given force (remember f =

ma?). We also have
to keep track of the accumulated forces acting on each

vertex. The file structure
storing the per-vertex information for now could be

something like this:
struct Vertex {
Vector3 pos; // current position
Vector3 vel; // current velocity
Vector3 restPos; // position given by data

Vector3 force; // the total force on a vertex
real mass;
The Vector3 data type is a structure holding the three

components of a vector;
“real” can be either a singleor double-precision

floating-point representation.
14.2.1 Numerical Integration
Time is a continuous quantity. When writing down equations

for the positions
and the velocities of the vertices, they should hold for

every time t. In computer
simulation, however, we always have to deal with discrete

time steps of length h.
The introductory chapter of this book (Chapter 1) gives an

overview of the most
important integration schemes. We update the velocities and

positions by the
following scheme: v i (t + h) = v i (t) + hf total i (t), x

i (t + h) = x i (t) + hv i (t + h).
This is the semi-implicit Euler scheme. In contrast to the

standard Euler inte
gration, this scheme uses v(t + h) (implicit) in the

equation for x(t + h) while
the standard Euler integration uses v(t) (explicit). This

scheme still commits a
global error of the order of h every time step (see

[Donelly and Rogers 05] on
this matter). If more accuracy is needed, we could consider

using higher-order
integration schemes, such as the fourth-order Runge-Kutta

method [Acton 70]. Numerical solutions of differential
equations may be unstable because the problem being solved
is unstable or because the numerical method fails. Care has
to be taken to construct forces that prevent these
instabilities. Additionally, damping can help to get an
integration scheme stable. The next logical step now is to
model realistic forces. We will start with a simple force
that forms the basis for all other forces we will discuss.
14.2.2 The Spring Force To get a force that pulls a vertex
towards a desired goal position (like its rest position),
think of a spring that links the vertex to its goal
position. Each spring has a certain constant, which gives
us the force driving it to its goal position. When the
spring force is too strong compared to the time step, the
system will overshoot, which means that it will be driven
to the other side of the spring, and even more far away
than it was before. This way, the vertex will never reach
its goal position, but it will steadily increase its
energy. The system “explodes.” The maximum force that will
drive an object towards a rest position without
overshooting is given by f rest i = x 0 i − x i h 2 . That
this force does not overshoot can be seen by starting off
at some time 0 and calculating the succeeding positions and
velocities for the next two time steps. The system will
“convert” the displacement from the rest position (which
means potential energy) into speed (which means kinetic
energy), and the speed back into displacement, but the
displacement will not get bigger over time, so the total
energy will not rise over time. This force can be scaled by
a factor smaller than 1 to make the force smaller—this is a
first example of a material property that can be tuned by
the designer on a per-vertex basis. When we calculate the
force as presented above, it is absolutely necessary that
the time step h be fixed to a certain value throughout the
simulation and the system be integrated in constant
intervals. The physics integration should be run on a
dedicated thread, where it updates the positions and
velocities at a constant rate, like 30 frames per second
(FPS). With the knowledge of this force, a simple form of
secondary motion can be constructed. With the calculated
force, we need to update our velocities and the actual
positions (that are drawn on the screen) according to the
presented integration scheme.
Here is the pseudocode notation:
for each vertex v {
v.force = (v.restPos − v.pos) / (timeStep ∗ timeStep);
v.vel += v.force ∗ timeStep;

v.pos += v.vel ∗ timeStep;
With such a simulation, we would see the vertex positions

oscillate about the
rest position on and on, for infinity (apart from numerical

errors that are intro
duced in every time step). The contribution of this force

can be reduced as more
realistic forces are added to the system, but it should

still be integrated into the
simulation since it helps to keep the system controllable,

as there is always a trend
to the completely undeformed shape.
14.2.3 Safety Belts
This discussion would be completely out of place in a

scientific simulation, but
here we are speaking of computer games—we have to deal with

a lot of user
interaction, collisions, and rapid change of motion. Safety

comes first.
Although the algorithms introduced in this chapter provide

excellent robust
ness that should be suitable for computer games, it can

always happen that be
cause of some unforeseeable reason, suddenly the system is

totally pushed away
from its rest position or gets a boost in velocity that

will blow up the whole sys
tem.
We deal with this in the most straightforward way: we just

have to follow the
simple principle of “If X hurts, don’t do X .”
So whenever a vertex is too far away from its rest

position, we just have to
make sure that it isn’t.
Define a radius in which the vertex is allowed to be, and

whenever it leaves
this sphere, put it back on the surface of the sphere.
The same should be done for the velocities. This is called

position and velocity
clamping—a quick way to get rid of all possible accidents

that can happen to the
simulation.
14.2.4 Global Damping
This falls into the same category as position and velocity

clamping but it can be
motivated on a physical basis. We always want our objects

to come to rest at some
point in time, so we make sure they do. Damping can always

be used to enforce stability on spring systems, even if the
forces are not constructed to be stable with the used
integration scheme [Bhasin and Liu 06]. Every system loses
energy over time. In a physical sense, the energy is not
lost but goes into motion that is not visible to
perception, such as heating the materials or the
surrounding system. Here, a simple damping model is used
that will cause the system to come to rest by just scaling
the velocity by a certain factor at every time step.
scaleVector(v.vel, v.factorDamping); Damping forces can be
constructed to drain energy from the system in a more
sophisticated way so global damping can be reduced. But in
the end, a form of global damping should still be
implemented. 14.3 Neighborhood Interaction For the
following forces, the neighborhood (nbr(i)) of vertex x i
needs to be defined. The neighborhood can be quite a
general set of vertices; we just need an applicable
definition of it. If we do not have any connectivity
information, we can define it to be every vertex that is
within a certain radius of another. For vertices that form
a lattice, the neighborhood can be the nearest-neighbor
lattice sites. A triangle mesh has connectivity information
supplied by definition, for example, in the form of a
stream of vertices and a stream of triangles that group
three vertices into one surface fragment and store
additional information that is needed on a per-triangle
level. (See Figure 14.2.) Here, we define the vertex’s
ring-0 neighbors as its neighborhood. (This equals the
vertices that are grouped into one triangle with the
vertex!) We also define each vertex as a neighbor of
itself, which makes the formalism later simpler. Figure
14.2. A vertex x i and its local neighborhood.
The representation of the mesh as triangles is optimal for

the graphics hard
ware and the rendering process, but it is unsuited for our

algorithms because the
neighborhood of a vertex cannot be determined efficiently.

If the overhead can be
afforded, neighbor lists for all vertices can be created at

the beginning:
for each triangle t {
for each pair of vertices v i, v j in t { v

i.neighborsAdd(v j); v j.neighborsAdd(v i);
This provides very fast access to the neighborhood of a

vertex, but on the
downside, it takes a lot of extra memory that can become

unacceptable. Since
neighborhood access is unlikely to become the bottleneck

here, it is advised that
we trade some of the access speed for memory—there are way

more efficient data
structures for this purpose [Campagna et al. 98]. Here, the

DirectedEdge
data structure is used:

struct DirectedEdge {
int vertex; int neighbor; int next; int prev;
};
This data structure represents every triangle as three

directed edges (see Fig
ure 14.3), where each edge has a reference to the vertex it

is directed to, as well as
give each vertex the reference to just one of the edges

that head away from itself,
we can restore its whole neighborhood just with the prev

and the neighbor
Figure 14.3. A directed edge (dashed) and its previous, its

next, and its neighbor edges. To
retrieve the whole neighborhood information, we just need

to have a pointer to the previous
Figure 14.4. Initial (x 0 i on the left) and deformed (x i

on the right) positions of a vertex i
and its neighbors.
14.3.2 Maintaining Surface Details and Shape Matching
Simulating the effect of surface connectivity based on a

physical model is com
plex. Using the physically correct material laws would not

allow for real-time
simulation without cutting the geometrical complexity by

too much. Fortunately,
we are in a lucky position since our simulation does not

have to be realistic, it just
has to look realistic. And even most physical models are

just approximations of
what is really going on. That is the way it works. There

are also no rigid bodies
in nature, but there are some bodies that look and behave
as if they were rigid.
We will use a technique called shape matching [Mu¨ller et

al. 05] that approx
imates the influence of the neighboring surface vertices

for every vertex surpris
ingly well.
The technique is absolutely nonphysical, but the result

looks very realistic,
plus it has some important physical properties: it

preserves the center of mass and
the angular momentum of the matched vertices. This way, it

will not introduce
any net torque to the system. The basic idea is this: for
each vertex, we calculate
the least-squares rigid body transformation of its

neighbors rest positions and use
them as new goal positions. For those not familiar with the
topic, this should be
explained in a little more detail.
When the mesh gets deformed, the vertex positions are no

longer equal to the
rest positions of the mesh (see Figure 14.4).
Since the vertices are connected, they should be driven

back into their rigid
shape by the influence of their nearest neighbors (see

Figure 14.5). The rigid
shape of the neighborhood does not have to be defined by

the rest positions x 0 i
because it is possible to translate and rotate the vertex

cloud in whole, without
changing the relative shape of it.
Think of a mesh where each vertex has been moved by the

same translation—
we could just move the rest position by the same

translation as the vertices and
there will be no forces acting. What if the vertices have

been displaced by difFigure 14.5. A vertex that has been
displaced relative to its neighborhood should feel a
back-driving force that maintains surface details. ferent
distances? The best translation of the rest positions is
the one that matches the centers of mass of the initial
(rest) shape and the deformed shape (Figure 14.6 (left)).
This results in the following goal positions: c i = x 0 i −
(x 0 cm − x cm ). For this quantity, we need to calculate
the centers of mass for the original and the deformed
shapes: for each vertex v in v i.neighbors { cm += v.pos;
cm 0 += v.goalPos; masses += v.mass; } cm /= masses; cm 0
/= masses; This is still not the optimal solution because
the rotational degree of freedom has not yet been used. It
is introduced in the form of the matrix R, which represents
the optimal rotation of the point cloud around the matched
centers of mass (Figure 14.6 (right)). The optimal rigid
transformation c i = R(x 0 i − x 0 cm ) + x cm has the
property that it minimizes the quantity ∑ i m i (c i − x i
) 2 . This matches the goal positions and the actual
coordinates in the “least-squares sense.” Additionally, it
takes care of the fact that heavy particles are harder to
(a) (b)
Figure 14.6. The original shape of a vertex i and its

neighbors is matched to the deformed
shape (x i ) by an optimal rigid transformation. (a) This

results in a goal position c i for
vertex i. (b) Then the vertex x i is pulled towards the

goal position c i .
move than lighter particles—a displacement for a heavy

particle should cost more
than that of a light particle.
Since the calculation of this rotation is not directly

obvious, the derivation of
it is put into the appendix for the interested reader.
Now the spring force can be used again to construct a force

that pulls the
vertex towards the goal positions c i : f detail i = c i −
x i h 2 .
Since the least-squares goal positions were calculated, it

should be remarked
that these can also be used to build a rigid-body

simulator. The goal positions
are, of course, the positions of the rigid shape. If we let

the actual positions of the
vertices directly snap to the goal positions after each

integration step, the behavior
of a rigid body is mimicked.
The goal positions can also be used to introduce another

form of damping: one
process [Mu¨ller et al. 08] uses the least-squares

algorithm to fit an instantaneous
rigid motion to the particles. Then at every time step,

nonrigid motion is bled off
until only the rigid body motion remains.
14.3.3 Deformable Surface Mesh
We have accumulated several forces that can act on every

vertex in the mesh. The
relative strength of the forces must be defined per

material or per vertex. They
can be tuned by the designer via an editing interface in

the model editor so they
end up with a realistic simulation of the material in

question. Another method
would be to acquire the parameters from example animations

that already exist struct Vertex { Vector3 pos; Vector3
vel; Vector3 goalPos; Vector3 force; // force coefficients
real factorRest; real factorDetail; real factorNeighbors;
real factorDamping; // one of the edges to retrieve
neighborhood DirectedEdge edge; real mass; }; Listing 14.1.
The complete vertex class. for the model in question. A
suitable parameter-fitting algorithm is presented in [Shi
et al. 08]. We have to keep in mind that there are limits
within which the parameters have to be set for a stable
simulation. A complete vertex structure that accumulates
the per-vertex information about everything discussed until
now could look like Listing 14.1. We accumulate the forces
discussed so far with f total i = α rest i · f rest i + α
neigh i · f neigh i + α detail i · f detail i and check
whether the calculations work as intended. A block of 16
vertices connected in a simple geometry is defined to test
the implementation (see Figure 14.7). The red spheres are
the rest positions of the mesh while the white spheres are
the goal positions of the shape-matching algorithm. The
actual positions are the yellow spheres. First, we can
displace every vertex just a bit and watch it go back into
its original shape. If we apply a driving force to our
block on the right, even the vertices on the other end of
the body start to wiggle about. In the demo, we can also
switch on a gravitational force (and set factorRest to
zero) and watch the body hit the ground. The body stays in
shape just by means of the shape-matching algorithm (see
Figure 14.8). A demo is included with the supplemental
materials.
Figure 14.7. Driven deformation of a simple mesh geometry

(see Color Plate XIV).
Figure 14.8. Under the influence of gravity, the geometry

stays in shape just by means of
shape matching of the local neighborhoods (see Color Plate

XV).
14.4 Volumetric Effects
While the shape matching of the surface vertices has a huge

impact on realism,
it still is not a complete solution for our problem, since

what we are missing
completely is the influence of the interior of the body on

the surface of it.
A more practical problem with this is that if only shape

matching is used, the
surface mesh will not follow the bone motion very well, and
too much contribu
tion of f rest is needed, which renders the simulation
unrealistic. The model we are
dealing with is a surface mesh. In a realistic material

simulation, the surface ver
tices should not only experience forces from its neighbors

but also forces acting
on the surface from the inside. Here we run into a problem.

We do not have any
information about the inside of the mesh. Meshless shape

matching [Mu¨ller et al. 05] discards neighborhood
information in whole and performs shape matching on the
whole point cloud. This way, each vertex feels the
influence of every other vertex, as would a realistic soft
material. The problem with this is that the larger the
shape-matching clusters are, the faster deformations are
smoothed out, and the shape will return to the rigid shape
much sooner. If the algorithm is unaltered, the range of
motion is cut drastically. Within the limit of all vertices
in one cluster, it will always try to match all particles
to the undeformed mesh. Thus, it will only allow small
deviations from the rigid shape. This comes in handy for
simulating rigid-body dynamics with this algorithm, but
this is not in the focus of this chapter. 14.4.1 Extensions
to Meshless Shape Matching Mu¨ller [Mu¨ller et al. 05]
proposes some extensions to the meshless shapematching
algorithm to allow for bigger derivations from the rigid
shape. We should look at it for completeness, however it is
not that well-suited for character animation. The idea is
to allow the transformation that transforms x 0 i into c i
, c i = Rx 0 i + t, to be more general. Sheer and stretch
modes can be accounted for by mixing a bit of the
previously calculated linear transformation A into the
transformation βA+ (1− β)R. Here the mixing is controlled
by the additional parameter β. The transformation R still
ensures that there is a tendency towards the undeformed
shape. Volume conservation has to be taken care of by
ensuring that det(A) = 1, which is not automatically the
case. This can be extended to include quadratic
deformations. We will not use this approach because we will
still lose too much realism by discarding the neighborhood
information, especially the small, high-frequency modes we
want to achieve. Extending the range of motion for the
shape matching of the neighborhood clusters is not
necessary. 14.4.2 Lattice-Based Shape Matching Another way
of simulating volumetric effects is to turn to discrete
approximations of the inside of the mesh. The general idea
is to fill the inside with a lattice of evenly spaced
vertices, let them take care of the physics, and
reconstruct the deformed surface mesh from the deformed
lattice after. Unfortunately, these discrete
approximations can be very expensive to simulate. Simple

lattice deformers have
been around for a while—like ChainMail (see [Gibson and

Mittrich 97] again),
which, although providing speed and robustness suitable for

interactive process
ing, suffers from limited realism.
Here again, shape matching can come to our help. In [Rivers

and James 07],
an algorithm is presented to efficiently calculate the

shape matching of a cubic
lattice.
The idea is to voxelize the mesh and flood the inside of

the mesh with solid
objects in a cubic lattice. Steinemann [Steinemann et al.

08] uses an octree-based
hierarchical sampling instead of an evenly spaced lattice.

The original mesh is
then deformed using trilinear interpolation of the vertex

positions in the lattice.
Although this approach results (depending on the resolution

of the lattice) in inter
active rates, we will use a much more simple approach to

account for volumetric
effects that is more suited to character animation. It is

presented in the next para
graph.
14.4.3 A Link to the Bone
When there is a bone model that drives the mesh, another

simplified model can be
used that mimics the real situation quite well [Shi et al.
08]. We apply yet another
spring to our surface vertex for each bone and link it to

the bones they are assigned
to. But we do not fix the end at a certain position along

the bone, allowing it to
slide freely along the bone. This way, the force tries to
maintain the original
distance from the bone. Before constructing this force, a

bone model is defined
that assigns each vertex just one bone. This will be

extended to a model that
assigns more than one bone to a vertex for smooth skinning.

Here, the calculated
force will be the (weighted) sum of the contributions of

each bone.
A basic bone model. We start with a simple bone model to

discuss the basic
structure. The bone model will be built up from the joints,

where each joint has
a position, an orientation, and a parent. A joint without a

parent is called a root
joint.
struct Joint { Quat4 orient; Vector3 pos; int parent;};
The link between a joint and its parent is called the bone.
Each vertex is
assigned a joint, and its rest position x 0 i is calculated

by x 0 i = qˆx rel i qˆ −1 + j i , Figure 14.9.
Transforming a relative coordinate x rel i (left) to global
coordinates x 0 i (right) using the positions of its joint
j i and the joint’s parent j p . where qˆ is the rotation
quaternion and j i is the position of the joint.
Quaternions are explained in the introductory chapter
(Chapter 1), since they are quite useful for the
representation of rotations in computer graphics—especially
character animation. The vertex positions are relative to
its supporting joint and have to be transformed into global
space (see Figure 14.9). The vertex structure is extended
by a vector called relPos, which is the position of a
vertex relative to its supporting joint. This is the only
coordinate the designer has to supply; the restPos is
calculated from this coordinate using the above formula.
for each joint j { v = j.pos − j.parent.pos; normalize(v);
j−>orient = rotationQuaternion(u, v); } for each vertex v {
j = v.joint; tempPos = rotateVector(v.relPos, j.orient)
v.goalPos = tempPos + j.pos; } In the first loop, we
calculate the rotation quaternion of the joint as described
above. In the second loop, we use the calculated quaternion
to transform our relPos coordinates into global space
(restPos). These are again the positions that are used by
all the force calculations we discussed before. Calculating
the force. We need a force that maintains the distance to
the bone for each vertex. For this, we compare the actual
distance to the desired distance. We have to calculate the
distance from the bone to the actual positions x i and the
distance to the rest positions x 0 i . First, the unit
vector in the direction of the joint’s parent is obtained
by
Figure 14.10. The projection x proj of the vertex x i on

the bone is used to construct the
distance x ib from the bone for vertex x i . The distance

is compared to the distance of the
rest position x 0 i to construct a force f bone i that

maintains the distance to the bone.
axis = joint.pos − parentJoint.pos;
normalize(axis);
From this, the part of the vertex position that points in

the direction of the joint’s
parent can be calculated by taking the dot product, and the

projection x proj of x i
on the bone can be calculated by multiplying the unit

vector in the direction of the
bone with this quantity.
projection = dotProduct(axis, v.pos);

projVector = scaledVector(axis, projection);
The vector that points from the nearest point on the bone
to the vertex x i is
now just the difference between x i and x proj . We call it

x ib for the actual positions
and x 0 ib for the goal positions. This is shown

graphically in Figure 14.10.
With these two quantities, a force that pulls the vertex to

the desired distance
from the bone can be constructed: f bone i = ( |x 0 ib | |x

ib | − 1 ) x ib h 2 ,
where |x ib | is the length of x ib . Whenever x 0 ib is

longer than x ib , the force is
directed away from the bone (in the direction of x ib ),

and if x 0 ib is shorter than
x ib , the force is directed towards the bone, as is

needed. Figure 14.11. A tube, supported by three joints. In
our variable names, the calculation looks like this:
preFactor = (distanceVectorAbs0 / distanceVectorAbs − 1)
/(timeStep ∗ timeStep); scale(distanceVector, preFactor);
14.4.4 Skeleton-Driven Mesh We now have a detailed force
model consisting of several forces that can be added: f
total i = α rest i · f rest i + α neigh i · f neigh i + α
detail i · f detail i + α bone i · f bone i . We can apply
this model to a geometry with two bones connected by a
joint, with a cylindrical mesh around each bone (see Figure
14.11). The joints can be moved freely by selecting them
with the mouse and moving them around—this causes kinematic
deformation of the goal positions. The surface geometry
follows the positions of the joints while experiencing
secondary deformation. (See Figure 14.1 (middle and
right).) 14.4.5 Application to Smooth Skinning This basic
bone model works very badly, especially in joint regions
where each vertex should feel the influence of more than
one bone. This is addressed by smooth skinning (as opposed
to rigid skinning, used before) techniques such as
skeleton-subspace deformation (SSD), which has been around
in computer graphics for quite a while [Magnenat-Thalmann
et al. 88]. This is used, for example, in the MD5 model
format that comes from id Software’s Doom 3 first-person
shooter. Vertex positions are not given explicitly but must
be calculated by the contributions of multiple weights that
are assigned to joints. Here, the weights have relative
positions to the bones, not the vertices, so these weight
positions
get transformed according to their assigned bone. The

position of a vertex is a
weighted sum of these transformed weight positions. The

Internet provides a lot
of detailed documentation on this format.
Geometry produced from this specification works well as a

kinematic basis
for the secondary deformations presented here.
The vertices in the MD5 format are given implicitly as a

sum of weights:
struct ModelVertex { int start; int count; };
Here, start defines the first weight and count the number
of weights after the
starting weights that belong to this vertex:
struct ModelWeight { int joint; float bias; Vector3 pos; };
The weight contains the information of how to construct the

final vertex po
sitions; pos defines the position of the weight, and bias

states how much the
weight contributes to the vertex. Using the weight, we can

access the bone model
information, since joint assigns each weight a joint:
struct ModelJoint {
char name[64];
int parent;
Vector3 pos;
Quat4 orient;
};
This is basically the same definition of a joint that was

used before. List
ings 14.2 and 14.3 show the application of the

surface-detail preservation and the
bone-distance preservation forces to an actual MD5 model.

Since an MD5 model
can consist of several independent meshes, we have to

specify which one we want
to deform. The supplementary material includes an

application that demonstrates
the interactive deformation of the animated Stanford

armadillo model (see Fig
ure 14.12). Vector3 DeformableMD5::detailForce(int mesh,

int vertex) { int i; Vector3 q, p; Vertex ∗vertices =
meshes[mesh].finalVertices; int neighbors[MAX NEIGHBORS];
int numNeighbors = getNeighbors(mesh,vertex,neighbors); /∗
if there are less than 3 particles in the neighborhood, the
particle is isolated ∗/ if (numNeighbors < 3) return
vecCreate(0.0f,0.0f,0.0f); /∗ calculate centers of mass ∗/
Vector3 cm = vecZero(); Vector3 cm 0 = vecZero(); float
masses = 0; for (i = 0; i < numNeighbors; i++) { Vertex ∗ v
= &vertices[neighbors[i]]; vecAdd(cm, v−>pos); vecAdd(cm 0,
v−>restPos); masses += v−>mass; } vecScale(&cm,
1.0f/(masses)); vecScale(&cm 0, 1.0f/(masses)); /∗
calculate optimal rotation R ∗/ Matrix3x3 A pq = matZero();
for (i = 0; i < numNeighbors; i++) { Vertex ∗ v =
&vertices[neighbors[i]]; q = v−>restPos − cm 0; p = v−>pos
− cm; for (int j = 0; j < 3; j++) for (int k = 0; k< 3;
k++) { A pq[j][k] += v−>mass ∗ pk ∗ q[j]; } } Matrix3x3 R =
getRotationalPart(A pq); /∗ calculate the position that
preserves best laplacian coordinates ∗/ Vector3 diff =
vertices[vertex].restPos − cm 0; matMult(&R, &diff);
Vector3 force = diff + cm − vertices[vertex].pos;
vecScale(force, 1/(timeStep ∗ timeStep)); return force; }
Listing 14.2. The shape-matching algorithm for surface
detail preservation on a model definition using SSD.
Vector3 DeformableMD5::volumetricForce(int mesh, int

vertex) {
int i;
ModelMesh ∗m = &meshes[mesh];
ModelVertex ∗mv = &m−>vertices[vertex];
Vertex ∗v = &m−>finalVertices[vertex];
real totalWeight = 0.0f;
Vector3 totalForce = vecCreate(0.0f,0.0f,0.0f);
/∗ calculate the contribution of one joint ∗/
for (i = mv−>start; i < mv−>start + mv−>count; i++) {
ModelWeight ∗w = &m−>weights[i];
/∗ from weight, retrieve joint and its parent ∗/
ModelJoint ∗j = &skeleton[w−>joint];
ModelJoint ∗jp = &skeleton[j−>parent];
/∗ calculate the unit vector in the direction of the bone ∗/
Vector3 axis = j−>pos − &jp−>pos;
vecNormalize(axis);
/∗ calculate the force contribution as before ∗/
Vector3 diffPos = v−>pos − j−>pos;
Vector3 diffPos0 = v−>restPos − j −>pos;
real projection = vecDot(axis, &diffPos);
real projection0 = vecDot(axis, &diffPos0);
Vector3 projVector = vecScaledVector(axis, projection);
Vector3 projVector0 = vecScaledVector(axis, projection0);
Vector3 distanceVector = diffPos − projVector;
Vector3 distanceVector0 = diffPos0 − projVector0;
real distanceVectorAbs = vecLength(distanceVector);
real distanceVectorAbs0 = vecLength(distanceVector0);

if (distanceVectorAbs == 0) return vecCreate(0,0,0);
real preFactor = (distanceVectorAbs0 / distanceVectorAbs −

1.0f) /(timeStep ∗ timeStep);
Vector3 result = distanceVector;
vecScale(&result, preFactor ∗ w−>bias);
totalWeight += w−>bias;
vecAdd(&totalForce, &result);
/∗ sum over all contributions ∗/
vecScale(&totalForce, 1.0f/totalWeight);
return totalForce;
Listing 14.3. The bone-distance preservation algorithm for

a model definition using SSD. Figure 14.12. The Stanford
armadillo model, experiencing secondary deformation—the
vertices at the body region deform strongly, giving the
experience of fatty tissue. Figure 14.13. The surface
vertices can be subject to external forces at runtime,
resulting in interactive dynamic deformations. 14.5 Final
Remarks In this chapter, we have managed to bring
skeleton-driven animation beyond the purely kinematic
approach that is currently used in computer games by
developing a dynamic simulation that enriches the visual
experience of the animation. Although the simulation is
based on forces, it is not exactly physics based since the
forces are not modeled on physical laws. Of course, no
technique is suited for all applications—the techniques
used here are not suitable when an accurate modeling of the
physical situation is needed. This is the weak point of
this kind of simulation. But it turns out that the impact
on believability in games is immense.
For an accurate (based on the physical definition of the

strain tensor) simula
tion, Chapter 10 provides much better results.
Although the calculations could be applied to the mesh

during a preprocessing
stage to reduce computational effort, the technique is very

well-suited for real
time processing for the benefit of interactivity of the

animation. The skeleton
driven vertices can be subject to external forces of any

kind (see Figure 14.13).
Special collision-detection algorithms might be needed here

[Teschner et al. 05],
which is unfortunately a lot more computationally

intensive. This is beyond the
scope of this chapter.
Appendix: Calculating the Optimal Rotation
For the shape-matching algorithm, a rotation is needed that

best matches a given
set of points to another set of points (with an equal

number of points) by mini
mizing their distance-squares.
Since we already matched the centers of mass (so there is

no translation nec
essary for optimization anymore), we define the relative

locations by q i = x 0 i − x 0 cm p i = x i − x cm
We start off by searching for a linear transformation A

such that c i = Aq i +
x cm matches x i best, and then we try to extract the

rotation that A contains. The
quantity we have to minimize can now be written as ∑ i m i

(Aq i − p i ) 2 .
We should now focus on the contribution of one neighbor i

and omit the mass
for now. We can simplify our notation for the next few
calculations to (Aq− p) 2 = (Aq− p)(q T A T − p T ).
Now we write out the multiplications component-wise (take
care: u, v, w are
matrix and vector entry indices now, not particle indices):
u ( ∑ v A uv q v − p u )( ∑ k A uw q w − p u ) = ∑ u ( ∑ v
A uv q v ) 2 − 2p u ∑ v A uv q v + ∑ u p 2 u . Taking the
derivative ∂/∂A lm to the lm component of the matrix A
yields ∂... ∂A lm = 2A lm q m q m − 2p l q m . Writing this
again in matrix-vector notation, we get 2 ( (Aq)q T − pq T
) , and setting the derivative to zero brings us to Aqq T −
pq T = 0 →A = pq T · (qq T ) −1 . Doing this calculation
with the whole sum and the mass-weights would bring us to A
= ( ∑ i m i p i q T i )( ∑ i m i q i q T i ) −1 . This is
great because this is a quantity we can actually calculate.
The second part we can throw away because it is symmetric
and, thus, cannot contain a rotation. The rest of the
expression we call A pq . Just do the math: A pq =
zeroMatrix(); for each vertex v in v i.neighbors { q =
v.restPos − cm 0; p = v.pos − cm; for all entries j, k { A
pq[j][k] += v.mass ∗ p[k] ∗ q[j]; } } By so-called polar
decomposition, we are now able to decompose the matrix A pq
into a rotation R and a scaling S: A pq = RS. How the
scaling can be obtained can be understood intuitively: if
we apply A pq to a unit vector, the rotational partR will
rotate the vector on the unit sphere, but the scaling S
will displace it from the shell of the unit sphere. Now we
apply A T pq : the rotational part R T will rotate the
vector back to the original position,
while the scaling will displace the vector even more from
the shell. So the com
bined operation acts as if we had applied S twice: A T pq A

pq = (RS) T (RS) = S T R T RS = S T S = S 2 .
So, unfortunately, we have to take the square root of this

matrix equation to
obtain S. As this is a common problem in mathematics and

physics, this problem
has been addressed a lot and there are good numerical

methods to calculate this
quantity. The usual approach is to diagonalize the matrix S

2 : S 2 = Vdiag(λ)V T ,
where λ are the eigenvalues of the matrix S 2 .
A very good overview, as well as some state-of-the-art

algorithms for the di
agonalization of 3× 3 matrices, has been given by [Kopp

06]. Once the matrix is
diagonalized, we can take the square root of the diagonal

entries. S = Vdiag( √ λ)V T
to obtain the matrix S.
There is also what is called the Denman–Beavers square root

iteration—this
works without diagonalization. It is easy to implement and

very robust, although
not as efficient (see [Denman and Beavers 76]).
We will use the Jacobi algorithm here, which is the oldest

but is also a very
robust algorithm. It starts off with the identity matrix

for V and applies so-called
Jacobi sweeps on it (see [Kopp 06]).
Since the rotation matrices we are dealing with are “almost

diagonal” already,
it will take only one to two Jacobi sweeps on the average

for each vertex. Since
this operation is done very often, we should think about

caching the matrix V
from the previous time step instead of starting off with

the identity matrix at every
time step. This induces further memory usage but reduces

computation time.
Using S, we can now calculate the rotational part: R = A pq

S −1 .
Acknowledgments
I want to thank Ury Zhilinsky for his input and support

during my work on this
chapter. The MD5 model used was built upon polygonal data
from the Stanford
[Mu¨ller et al. 08] M. Mu¨ller, B. Heidelberger, M. Hennix,

and J. Ratcliff. “Hierarchical Position Based Dynamics.”
Presentation given at Virtual Reality Interactions and
Physical Simulations VIRPhys, Grenoble, November 13– 14,
2008.
[Rivers and James 07] A. R. Rivers and D. L. James.

“FastLSM: Fast Lattice Shape Matching for Robust Real-Time
Deformation.” ACM Transactions on Graphics (SIGGRAPH’07)
26:3 (2007), Article No. 82.
[Shi et al. 08] X. Shi, K. Zhou, Y. Tong, M. Desbrun, H.

Bao, and B. Guo. “Example-Based Dynamic Skinning in Real
Time.” ACM Transactions on Graphics (SIGGRAPH’08) 27:3
(2008), Article No. 29.
[Steinemann et al. 08] D. Steinemann, M. A. Otaduy, and M.

Gross. “Fast Adaptive Shape Matching Deformations.” In
Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium
on Computer Animation, pp. 87–94. Aire-la-Ville,
Switzerland: Eurographics Association, 2008.
[Teschner et al. 05] M. Teschner, S. Kimmerle, B.

Heidelberger, G. Zachmann, L. Raghupathi, A. Fuhrmann,
M.-P. Cani, F. Faure, N. Magnenat-Thalmann, W. Strasser,
and P. Volino. “Collision Detection for Deformable
Objects.” Computer Graphics Forum 24:1 (2005), 61–81.

10.1201 b11324 Previewpdf PDF

Uploaded by

Copyright:

Available Formats

10.1201 b11324 Previewpdf PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10.1201 b11324 Previewpdf PDF

Uploaded by

Copyright:

Available Formats

game physics pearls

van den Bergen

Game Physics Pearls

Game Physics Pearls

No claim to original U.S. Government works

International Standard Book Number-13: 978-1-4398-6555-2 (eBook - PDF)

I Game Physics 101 1

2 Understanding Game Physics Artifacts 29

3 Broad Phase and Constraint Optimization for PlayStation R

3.3 Optimization of the Broad Phase . . . . . . . . . . . . . . . . 51

4 SAT in Narrow Phase and Contact-Manifold Generation 63

5 Smooth Mesh Contacts with GJK 99

III Particles 125

6 Optimized SPH 127

7 Parallelizing Particle-Based Simulation on Multiple Processors 155

IV Constraint Solving 177

8 Ropes as Constraints 179

9 Quaternion-Based Constraints 195

V Soft Body 215

10 Soft Bodies Using Finite Elements 217

10.4 Solving the Linear System . . . . . . . . . . . . . . . . . . . 241

11 Particle-Based Simulation Using Verlet Integration 251

12 Keep Yer Shirt On 271

13 Layered Skin Simulation 283

14 Dynamic Secondary Skin Deformations 305

14.5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 327

This book is written by and targeted at game physics programmers. We seek

—Gino van den Bergen

1.2 Vectors and Points

Figure 1.1. Relationship between points and vectors.

Figure 1.2. Vector scaling and addition.

In general, vectors can be scaled and added. Scaling (multiplying by a single

A set of vectors v is linearly dependent if one of the vectors in S can be repre-

This is known as an affine combination. We can express an affine combination as

1.2. Vectors and Points 5

So an affine combination can be thought of as a point plus a linear combination

v = xe1 + ye2 + ze3 .

In practice, we represent a vector in the computer by using the scale factors

1.2.2 Magnitude and Distance

If we scale a vector v by 1/v, we end up with a vector of magnitude 1, or a

Figure 1.3. Projection of one vector onto another.

1.2.3 Dot Product

a · b = ab cos θ, (1.1)

where θ is the angle between a and b.

The remaining, or orthogonal portion of a can be computed as

1.2.4 Cross Product

1.2. Vectors and Points 7

1.2.5 Triple Product

1.3 Lines and Planes

1.4. Matrices and Transformations 9

1.4 Matrices and Transformations

1.4.2 Basic Operations

Note also that matrix multiplication is noncommutative. That is, we cannot

1.4.3 Vector Representation and Transformation

or with one row, e.g.,

In this book, we will be using column matrices to represent vectors. Should

A linear transformation T is a mapping that preserves the linear properties of

aT (x) + T (y) = T (ax + y).

We can use matrices to represent linear transformations. Multiplying a vector

For a rotation quaternion,