10.1201 b11324 Previewpdf PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 180

game physics pearls

van den Bergen


Gregorius
game physics pearls
The common theme in this practical, hands-on collection is experience.
The contributors write based on their knowledge and skill in developing
tools and runtime libraries either in game companies or middleware

game
physics
houses that produce physics software for games on PCs and consoles.
Each article describes not only a specific topic, but provides an in-the-
trenches discussion of the practical problems and solutions when
implementing the algorithms, whether for a physics engine or game
application.

pearls
The chapters cover topics such as collision detection, particle-based
simulations, constraint solving, and soft-body simulation. Several of
the topics are about nonsequential programming, whether multicore
or for game consoles, which is important given the evolution of modern
computing hardware toward multiprocessing and multithreading.

Edited by
Image © Wayne Johnson, 2010.
Gino van den Bergen and Dirk Gregorius
A K Peters, Ltd. Used under license from Shutterstock.com A K PETERS
i i

i i

Game Physics Pearls

i i

i i
i i

i i

i i

i i
i i

i i

Game Physics Pearls

Edited by
Gino van den Bergen
and
Dirk Gregorius

A K Peters, Ltd.
Natick, Massachusetts

i i

i i
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2010 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works


Version Date: 20140703

International Standard Book Number-13: 978-1-4398-6555-2 (eBook - PDF)

This book contains information obtained from authentic and highly regarded sources. Reason-
able efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The
authors and publishers have attempted to trace the copyright holders of all material reproduced in
this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.
copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organiza-
tion that provides licenses and registration for a variety of users. For organizations that have been
granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
i i

i i

Contents

Foreword xi

Preface xiii

I Game Physics 101 1

1 Mathematical Background 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Vectors and Points . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Lines and Planes . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Matrices and Transformations . . . . . . . . . . . . . . . . . 9
1.5 Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Rigid-Body Dynamics . . . . . . . . . . . . . . . . . . . . . 15
1.7 Numerical Integration . . . . . . . . . . . . . . . . . . . . . . 22
1.8 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . 26
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2 Understanding Game Physics Artifacts 29


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Discretization and Linearization . . . . . . . . . . . . . . . . 29
2.3 Time Stepping and the Well of Despair . . . . . . . . . . . . . 31
2.4 The Curse of Rotations . . . . . . . . . . . . . . . . . . . . . 32
2.5 Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Collision Detection . . . . . . . . . . . . . . . . . . . . . . . 36
2.7 Joints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.8 Direct Animation . . . . . . . . . . . . . . . . . . . . . . . . 42
2.9 Artifact Reference . . . . . . . . . . . . . . . . . . . . . . . . 43

II Collision Detection 45

3 Broad Phase and Constraint Optimization for PlayStation R


3 47
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Overview of Cell/BE . . . . . . . . . . . . . . . . . . . . . . 47

i i

i i
i i

i i

vi Contents

3.3 Optimization of the Broad Phase . . . . . . . . . . . . . . . . 51


3.4 Optimization of the Constraint Solver . . . . . . . . . . . . . 57
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4 SAT in Narrow Phase and Contact-Manifold Generation 63


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Contact Manifold . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Physics Engine Pipeline . . . . . . . . . . . . . . . . . . . . . 65
4.4 SAT Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5 Intuitive Gauss Map . . . . . . . . . . . . . . . . . . . . . . . 75
4.6 Computing Full Contact Manifolds . . . . . . . . . . . . . . . 77
4.7 SAT Optimizations . . . . . . . . . . . . . . . . . . . . . . . 89
4.8 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 96
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5 Smooth Mesh Contacts with GJK 99


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2 Configuration Space . . . . . . . . . . . . . . . . . . . . . . . 101
5.3 Support Mappings . . . . . . . . . . . . . . . . . . . . . . . . 103
5.4 Overview of GJK . . . . . . . . . . . . . . . . . . . . . . . . 105
5.5 Johnson’s Algorithm . . . . . . . . . . . . . . . . . . . . . . 106
5.6 Continuous Collision Detection . . . . . . . . . . . . . . . . . 110
5.7 Contacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

III Particles 125

6 Optimized SPH 127


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2 The SPH Equations . . . . . . . . . . . . . . . . . . . . . . . 128
6.3 An Algorithm for SPH Simulation . . . . . . . . . . . . . . . 131
6.4 The Choice of Data Structure . . . . . . . . . . . . . . . . . . 132
6.5 Collapsing the SPH Algorithm . . . . . . . . . . . . . . . . . 139
6.6 Stability and Behavior . . . . . . . . . . . . . . . . . . . . . 143
6.7 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.9 Appendix: Scaling the Pressure Force . . . . . . . . . . . . . 150
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

i i

i i
i i

i i

Contents vii

7 Parallelizing Particle-Based Simulation on Multiple Processors 155


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.2 Dividing Computation . . . . . . . . . . . . . . . . . . . . . 156
7.3 Data Management without Duplication . . . . . . . . . . . . . 159
7.4 Choosing an Acceleration Structure . . . . . . . . . . . . . . 162
7.5 Data Transfer Using Grids . . . . . . . . . . . . . . . . . . . 173
7.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

IV Constraint Solving 177

8 Ropes as Constraints 179


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.2 Free-Hanging Ropes . . . . . . . . . . . . . . . . . . . . . . 181
8.3 Strained Ropes . . . . . . . . . . . . . . . . . . . . . . . . . 184
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

9 Quaternion-Based Constraints 195


9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
9.2 Notation and Definitions . . . . . . . . . . . . . . . . . . . . 195
9.3 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
9.4 Constraint Definitions . . . . . . . . . . . . . . . . . . . . . . 198
9.5 Matrix-Based Quaternion Algebra . . . . . . . . . . . . . . . 201
9.6 A New Take on Quaternion-Based Constraints . . . . . . . . . 203
9.7 Why It Works . . . . . . . . . . . . . . . . . . . . . . . . . . 203
9.8 More General Frames . . . . . . . . . . . . . . . . . . . . . . 204
9.9 Limits and Drivers . . . . . . . . . . . . . . . . . . . . . . . 206
9.10 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
9.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
9.12 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 213
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

V Soft Body 215

10 Soft Bodies Using Finite Elements 217


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
10.2 Continuum Mechanics . . . . . . . . . . . . . . . . . . . . . 218
10.3 Linear FEM . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

i i

i i
i i

i i

viii Contents

10.4 Solving the Linear System . . . . . . . . . . . . . . . . . . . 241


10.5 Surface-Mesh Update . . . . . . . . . . . . . . . . . . . . . . 246
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

11 Particle-Based Simulation Using Verlet Integration 251


11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
11.2 Techniques for Numerical Integration . . . . . . . . . . . . . 252
11.3 Using Relaxation to Solve Systems of Equations . . . . . . . . 256
11.4 Rigid Bodies . . . . . . . . . . . . . . . . . . . . . . . . . . 261
11.5 Articulated Bodies . . . . . . . . . . . . . . . . . . . . . . . 264
11.6 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . 266
11.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

12 Keep Yer Shirt On 271


12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
12.2 Stable Real-Time Cloth . . . . . . . . . . . . . . . . . . . . . 271
12.3 Modeling Real Fabrics . . . . . . . . . . . . . . . . . . . . . 273
12.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
12.5 Order of Cloth Update Stages . . . . . . . . . . . . . . . . . . 278
12.6 Conclusion, Results, and Future . . . . . . . . . . . . . . . . 279
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

VI Skinning 281

13 Layered Skin Simulation 283


13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
13.2 Layered Deformation Architecture . . . . . . . . . . . . . . . 283
13.3 Smooth Skinning . . . . . . . . . . . . . . . . . . . . . . . . 287
13.4 Anatomical Collisions . . . . . . . . . . . . . . . . . . . . . 291
13.5 Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
13.6 Jiggle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
13.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

14 Dynamic Secondary Skin Deformations 305


14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
14.2 The Interaction Model . . . . . . . . . . . . . . . . . . . . . 307
14.3 Neighborhood Interaction . . . . . . . . . . . . . . . . . . . . 311
14.4 Volumetric Effects . . . . . . . . . . . . . . . . . . . . . . . 318

i i

i i
i i

i i

Contents ix

14.5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 327


Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331

Index 341

i i

i i
i i

i i

i i

i i
i i

i i

Foreword

I am not a fan of gems-style books. Typically, they are assembled and glued
together as a collection of loosely related articles, and no attempt is made to unify
them by emphasizing common themes and ideas. When I was asked to write the
foreword for this book, my initial reaction was to decline politely, thinking this
was yet another such book. However, knowing the editors and their reputations
in the physics engine industry, I agreed to read the book in hopes that there might
be a few articles that make the book a worthwhile purchase.
I am delighted to say that this book is much more than I imagined. Those few
articles I hoped to find interesting turned out to be all the articles! I congratulate
the editors and the authors for producing the finest collection of game physics
articles I have seen to date. The common theme is experience. Each author de-
scribes not only a topic of interest, but provides an in-the-trenches discussion of
the practical problems and solutions when implementing the algorithms, whether
for a physics engine or game application. Moreover, I found it comforting that
the authors were consistent in their findings, giving me hope that writing a fast
and robust physics engine actually can be a scientific process rather than an en-
deavor that combines art, hacks, and voodoo. Also of importance is that several of
the topics are about nonsequential programming, whether multicore or for game
consoles, which is important given the evolution of modern computing hardware
towards multiprocessing and multithreading.
This book is a must-have if you plan on exploring the world of physics pro-
gramming. And I hope the editors and authors have plans on producing more
books of the same great quality.

—Dave Eberly

xi

i i

i i
i i

i i

i i

i i
i i

i i

Preface

It took some time before I considered myself a physics programmer. Like most
game programmers, I started out toying with physics in hobby game projects.
These early attempts at getting physical behavior out of an 8-bit home computer
did involve concepts such as velocity, mass, and force, but in my head they were
far from “real” physics. In the following years at the university I learned how to
program properly and got proficient in linear algebra, geometric algorithms, and
computer graphics. I took courses in theoretical mechanics and numerical anal-
ysis, expecting that after overcoming these hurdles, developing a physics engine
would be easy.
It never did get easy. In the coming years I was struggling to get even the
simplest of rigid body simulations stable on computers that were a thousand times
more powerful than the 8-bit home computer from my junior years. It would take
a considerable number of hacks to stop my “resting” contacts from oscillating and
from bouncing all over the place. And even then, the hacks would work only for
objects within certain ranges of masses and sizes. In the end, most of the hacks
that seemed to work would make Sir Isaac Newton turn in his grave. My inner
physicist was nagging at me, telling me that what I was doing was not “real”
physics. I was failing to truly capture classical mechanics as it was taught to me
in the code of a real-time physics engine. Surely, anyone who needs to resort to
the use of cheap hacks to get things working could never be considered a genuine
physics programmer.
After spending a couple of years in the game industry I learned that an under-
standing of classical mechanics and the ability to apply it in code are not the prime
skills of a physics programmer. Of course, any physics programmer should feel
comfortable with the science and mathematics behind physics, but being too con-
cerned about the science can become a burden as well. Games that involve physics
should primarily be targeted at playability and robustness rather than showcase a
maximum realism level. I had to overcome some hesitation before I willingly
started breaking the laws of physics and came up with hacks that created “un-
natural” behavior that fixed some game design issues. For example, in an arcade
racing game, cars should drift nicely, should rarely tip over, and if they do, should
always land back on their wheels—but most of all, they should never get stuck in
parts of the scenery. A game physics programmer can start with a realistic driving
behavior and then add helper forces and impulses to govern down force, balance,
turn rate, and what not, in order to get just the right behavior. It takes creativity
and a lot of experience to make a game that relies heavily on physics and is fun.

xiii

i i

i i
i i

i i

xiv Preface

This book is written by and targeted at game physics programmers. We seek


to provide experience and proven techniques from experts in the field and focus on
what is actually used in games rather than on how to achieve maximum realism.
You will find a lot of hacks here, but they should not be regarded as “cheap.” They
are the result of many years of hard work balancing playability, robustness, and
visual appeal. Such information was previously found only on internet forums
and at game developers conferences. This is the first gems-type book that collects
articles on tricks of the trade in game physics written by people in the trade, and
as such, seeks to fill a gap in game technology literature.
It was not easy to set this book in motion. There were two main forces working
against us during production. Firstly, in the game industry developers usually do
not have nine-to-five jobs. Dedicating the little spare time that one has to a book
article is not a light decision for many people. Secondly, physics programmers
tend to be quite modest about their work and need some encouragement to make
them share their ideas. Perhaps many of us are plagued by the same inner physicist
who nags about our disregard for the laws of physics. Nevertheless, once the
project gained momentum, great stuff came out of the gang of contributors we
managed to lure in.
I very much enjoyed editing for this book; it’s great to see a coherent book
taking form when each of the authors is adding a piece to the puzzle. I would like
to thank everyone who contributed to this book. My gratitude goes to the authors,
the staff at A K Peters, all external reviewers, copy editors, the cover designer,
and last but not least to Dirk, my fellow co-editor and physics buddy.

—Gino van den Bergen


June 16, 2010

My initial contact with game physic programming was totally accidental. I had
just finished my studies of civil engineering and I was sitting in a cafe talking to
an old girlfriend I hadn’t seen for a while. As she asked me what I would do next
I replied that I would be maybe interested in game development. As it turned out
her husband (who just returned from GDC) was a veteran in the game industry,
and he invited me for an interview. In this interview I learned that his company
was working on a release title for the PS3 and was currently looking for a physics
programmer. I had no idea what this meant, but I happily accepted.
When I started my work, I was overwhelmed by the huge amount of books,
papers, and especially rumors that were around. People on public forums had

i i

i i
i i

i i

Preface xv

many ideas and were gladly sharing them, but sadly these ideas often worked
reliably only in very specific situations. I quickly learned that it was very hard
to get accurate information that was actually useable in a game. At this point I
wished for a collection of proven algorithms that actually were used in a shipped
title, but sadly no such source existed at that time.
As Gino mentioned his idea of such a book, I was immediately excited and felt
flattered to support him as editor. It is my very strong belief that game physics pro-
gramming is about choosing the right algorithms rather then inventing everything
yourself. Having a collection of proven techniques is a great help in architecturing
a solution for the specific needs of any game.
It was a great experience editing this book, and I enjoyed every minute work-
ing with every author. They all showed a great enthusiasm for contributing to this
book. I would like to thank all the authors, the staff at A K Peters, all the external
reviewers, the copy editors, the cover designer, and especially Gino for getting
me on board of this project.
—Dirk Gregorius
June 18, 2010

i i

i i
i i

i i

i i

i i
i i

i i

-I-
Game Physics 101

i i

i i
i i

i i

i i

i i
i i

i i

-1-
Mathematical Background
James M. Van Verth

1.1 Introduction
It has been said that, at its core, all physics is mathematics. While that statement
may be debatable, it is certainly true that a background in mathematics is indis-
pensable in studying physics, and game physics is no exception. As such, a single
chapter cannot possibly cover all that is useful in such a broad and interesting
field. However, the following should provide an essential review of the mathe-
matics needed for the remainder of this book. Further references are provided at
the end of the chapter for those who wish to study further.

1.2 Vectors and Points


1.2.1 Definitions and Relations
The core elements of any three-dimensional system are points and vectors. Points
represent positions in space and are represented graphically as dots. Vectors rep-
resent direction or rate of change—the amount of change indicated by the length,
or magnitude, of the vector—and are presented graphically as arrows. Figure 1.1

Figure 1.1. Relationship between points and vectors.

i i

i i
i i

i i

4 1. Mathematical Background

Figure 1.2. Vector scaling and addition.

shows the relationship between points and vectors—in this case, the vector is
acting as the difference between two points. Algebraically, this is

v = x1 − x0

or
x1 = x0 + v.

In general, vectors can be scaled and added. Scaling (multiplying by a single


factor, or scalar) changes the length of a vector. If the scalar is negative, it can
also change the direction of the vector. Adding two vectors together creates a new
vector that points from the tail of one to the head of another (see Figure 1.2).
Scaling and adding together an arbitrary number of vectors is called a linear
combination:

v= ai vi .
i

A set of vectors v is linearly dependent if one of the vectors in S can be repre-


sented as the linear combination of other members in S. Otherwise, it is a linearly
independent set.
Points cannot be generally scaled or added. They can only be subtracted to
create a vector or combined in a linear combination, where

ai = 1.
i

This is known as an affine combination. We can express an affine combination as


follows:
 

n−1 
n−1
x = 1− ai xn + ai xi
i i

i i

i i
i i

i i

1.2. Vectors and Points 5


n−1 
n−1
= xn − ai xn + ai xi
i i

n−1
= xn + ai (xi − xn )
i

n−1
= xn + ai vi .
i

So an affine combination can be thought of as a point plus a linear combination


of vectors.
We represent points and vectors relative to a given coordinate frame. In three
dimensions, or R3 , this consists of three linearly independent vectors e1 , e2 , and
e3 (known as a basis) and a point o (known as an origin). Any vector in this space
can be constructed using a linear combination of the basis vectors:

v = xe1 + ye2 + ze3 .

In practice, we represent a vector in the computer by using the scale factors


(x, y, z) in an ordered list.
Similarly, we can represent a point as an affine combination of the basis vec-
tors and the origin:
x = o + xe1 + ye2 + ze3 .

Another way to think of this is that we construct a vector and add it to the origin.
This provides a one-to-one mapping between points and vectors.

1.2.2 Magnitude and Distance


As mentioned, one of the quantities of a vector v is its magnitude, represented by
v. In R3 , this is

v = x2 + y 2 + z 2 .

We can use this to calculate the distance between two points p1 and p2 by taking
p1 − p2 , or

dist(p1 , p2 ) = (x1 − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2 .

If we scale a vector v by 1/v, we end up with a vector of magnitude 1, or a


unit vector. This is often represented by v̂.

i i

i i
i i

i i

6 1. Mathematical Background

Figure 1.3. Projection of one vector onto another.

1.2.3 Dot Product


The dot product of two vectors a and b is defined as

a · b = ab cos θ, (1.1)

where θ is the angle between a and b.


For two vectors using a standard Euclidean basis, this can be represented as

a · b = ax b x + ay b y + az b z .

There are two uses of this that are of particular interest to game physics de-
velopers. First of all, it can be used to do simple tests of the angle between two
vectors. If a · b > 0, then θ < π/2; if a · b < 0, then θ > π/2; and if a · b = 0,
then θ = π/2. In the latter case, we also say that the two vectors are orthogonal.
The other main use of the dot product is for projecting one vector onto another.
If we have two vectors a and b, we can break a into two pieces a|| and a⊥ such
that a|| + a⊥ = a and a|| points along the same direction as, or is parallel to, b
(see Figure 1.3). The vector a|| is also known as the scalar projection of a onto b.
From Equation (1.1), if b = 1, then a · b is simply a cos θ, which we
can see from Figure 1.3 is the length of the projection of a onto b. The projected
vector itself can be computed as

a|| = (a · b)b.

The remaining, or orthogonal portion of a can be computed as

a⊥ = a − a|| .

1.2.4 Cross Product


The cross product of two vectors a and b is defined as

a × b = (ay bz − az by , az bx − ax bz , ax by − ay bx ).

i i

i i
i i

i i

1.2. Vectors and Points 7

This produces a vector orthogonal to both a and b. The magnitude of the cross
product is
a × b = ab sin θ,
where θ is the angle between a and b. The direction of the cross product is
determined by the right-hand rule: taking your right hand, point the first finger in
the direction of a and the middle finger along b. Your extended thumb will point
along the cross product.
Two useful identities to be aware of are the anticommutativity and bilinearity
of the cross product:
a×b = −b × a,
a × (sb + tc) = s(a × b) + t(a × c).

1.2.5 Triple Product


There are two possible triple products for vectors. The first uses both the dot
product and the cross product and produces a scalar result. Hence it is known as
the scalar triple product:
s = a · (b × c).
The scalar triple product measures the signed volume of the parallelepiped bounded
by the three vectors a, b, and c. Thus, the following identity holds:
a · (b × c) = b · (c × a) = c · (a × b).
The second triple product uses only the cross product and produces a vector result.
It is known as the vector triple product:
v = a × (b × c).
The vector triple product is useful for creating an orthogonal basis from linearly
independent vectors. One example basis is b, b × c, and b × (b × c).
The following relationship between the vector triple product and dot product
is also helpful in derivations for rigid-body dynamics and geometric algorithms:
a × (b × c) = (a · c)b − (a · b)c.

1.2.6 Derivatives
We mentioned that vectors can act to represent rate of change. In particular, a
vector-valued function is the derivative of a point-valued function. If we take the
standard equation for a derivative of a function as in
x(t + h) − x(t)
x (t) = lim ,
h→0 h

i i

i i
i i

i i

8 1. Mathematical Background

we can see that the result x (t) will be a vector-valued function, as we are sub-
tracting two points and then scaling by 1/h. It can be similarly shown that the
derivative of a vector-valued function is a vector-valued function. Note that we
often write such a time derivative as simply ẋ.

1.3 Lines and Planes


1.3.1 Definitions
If we parameterize an affine combination, we can create new entities: lines and
planes. A line can be represented as a point plus a parameterized vector:
l(t) = x + tv.
Similarly, a plane in R3 can be represented as a point plus two parameterized
vectors:
p(s, t) = x + su + tv.
An alternative definition of a plane is to take a vector n and a point p0 and
state that for any given point p on the plane,
0 = n · (p − p0 ).
If we set (a, b, c) = n, (x, y, z) = p, and d = −n · (p0 − o), we can rewrite this
as
0 = ax + by + cz + d, (1.2)
which should be a familiar formula for a plane.
For an arbitrary point p, we can substitute p into Equation (1.2) to test whether
it is on one side or another of the plane. If the result is greater than zero, we know
the point is on one side, if less than zero, it is on the other. And if the result is
close to zero, we know that the point is close to the plane.
We can further restrict our affine combinations to create half-infinite or fully
finite entities. For example, in our line equation, if we constrain t ≥ 0, we get a
ray. If we restrict t to lie between 0 and 1, then we have a line segment. We can
rewrite the line equation in an alternate form to make it clearer:
S(t) = (1 − t)x0 + tx1 .
In this case, x0 and x1 are the two endpoints of the line segment.
We can perform a similar operation with three points to create a triangle:
T(s, t) = (1 − s − t)x0 + sx1 + tx2 ,
where, again, s and t are constrained to lie between 0 and 1.

i i

i i
i i

i i

1.4. Matrices and Transformations 9

1.4 Matrices and Transformations


1.4.1 Definition
A matrix is an m × n array of components with m rows and n columns. These
components could be complex numbers, vectors, or even other matrices, but most
of the time when we refer to a matrix, its components are real numbers. An
example of a 2 × 3 matrix is
 
5 −1 0
.
12 0 −10
We refer to a single element in the ith row and jth column of the matrix A as aij .
Those elements where i = j are the diagonal of the matrix.
A matrix whose elements below and to the left of the diagonal (i.e., those
where i > j) are 0 is called an upper triangular matrix. Similarly, a matrix
whose elements above and to the right of the diagonal (i.e., those where i < j)
are 0 is called a lower triangular matrix. And those where all the non-diagonal
elements are 0 are called diagonal matrices.
A matrix is called symmetric if, for all i and j, the elements aij = aji , i.e., it
is mirrored across the diagonal. A matrix is skew symmetric if for all i and j the
elements aij = −aji . Clearly, the diagonal elements must be 0 in this case.

1.4.2 Basic Operations


Matrices can be added and scaled like vectors:

C = A + B,
D = kA.

In the first case, each element cij = aij + bij , and in the second, dij = kaij .
Matrices can be transposed by swapping elements across the diagonal, i.e.,
a matrix G is the transpose of matrix A if for all i and j, gij = aji . This is
represented as
G = AT .
Finally, matrices can be multiplied:

H = AB.

Here, for a given element hij , we take the corresponding row i from A and cor-
responding column j from B, multiply them component-wise, and take the sum,
or 
hij = aik bkj .
k

i i

i i
i i

i i

10 1. Mathematical Background

Note also that matrix multiplication is noncommutative. That is, we cannot


say in general that AB = BA.

1.4.3 Vector Representation and Transformation


We can represent a vector as a matrix with one column, e.g.,
⎡ ⎤
x1
⎢ x2 ⎥
⎢ ⎥
x = ⎢ . ⎥,
⎣ .. ⎦
xn

or with one row, e.g.,


 
bT = b1 b2 ··· bm .

In this book, we will be using column matrices to represent vectors. Should


we want to represent a row matrix, we shall use the transpose, as above. Using
this notation, we can also represent a matrix as its component columns:
 
A= a1 a2 ··· an .

A linear transformation T is a mapping that preserves the linear properties of


scale and addition; that is, for two vectors x and y,

aT (x) + T (y) = T (ax + y).

We can use matrices to represent linear transformations. Multiplying a vector


x by an appropriately sized matrix A, and expanding the terms, we get
⎡ ⎤ ⎡ ⎤⎡ ⎤
b1 a11 a12 · · · a1n x1
⎢ b2 ⎥ ⎢ a21 a22 · · · a2n ⎥ ⎢ x2 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ .. ⎥ = ⎢ .. .. .. .. ⎥ ⎢ .. ⎥ .
⎣ . ⎦ ⎣ . . . . ⎦⎣ . ⎦
bm am1 am2 ··· amn xn

This represents a linear transformation T from an n-dimensional space to an m-


dimensional space. If we assume that both spaces use the standard Euclidean
bases e1 , e2 , . . . , en and e1 , e2 , . . . , em , respectively, then the column vectors in
matrix A are the transformed basis vectors T (e1 ), T (e2 ), . . . , T (en ).
Multiplying transformation matrices together creates a single matrix that rep-
resents the composition of the matrices’ respective transformations. In this way,
we can represent a composition of linear transformations in a single matrix.

i i

i i
i i

i i

1.4. Matrices and Transformations 11

1.4.4 Inverse and Identity


Just as we can multiply a scalar by 1 to no effect, there is an identity transforma-
tion that produces the original vector. This is represented by the matrix E, which
is a square diagonal matrix, sized appropriately to perform the multiplication on
the vector and with all 1s on the diagonal. For example, the following will work
for vectors in R3 : ⎡ ⎤
1 0 0
E = ⎣ 0 1 0 ⎦.
0 0 1
Intuitively, this makes sense. If we examine the columns, we will see they are just
e1 , e2 , and e3 , thereby transforming the basis vectors to themselves.
Note that the identity matrix is often represented in other texts as I. We are
using E to distinguish it from the inertial tensor, as discussed below.
The equivalent to standard division is the inverse. The inverse reverses the
effect of a given transformation, as represented by the following:

x = A−1 Ax.

However, just as we can’t divide by 0, we can’t always find an inverse for a


transformation. First, only transformations from an n-dimensional space to an
n-dimensional space have inverses. And of those, not all of them can be inverted.
For example, the transformation T (x) = 0 has no inverse.
Discussing how to invert matrices in a general manner is out of the scope of
this chapter; it is recommended that the reader see [Anton and Rorres 94], [Golub
and Van Loan 93], or [Press et al. 93] for more information.

1.4.5 Affine Transformations


An affine transformation on a point x performs the basic operation

z = Ax + y,

where A and y are a matrix and vector, respectively, of the appropriate sizes to
perform the operation. We can also represent this as a matrix calculation:
    
z A y x
= .
1 0T 1 1

In general, in physical simulations, we are concerned with two affine transfor-


mations: translation (changing position) and rotation (changing orientation). (See
Figure 1.4.)

i i

i i
i i

i i

12 1. Mathematical Background

Figure 1.4. Translation and rotation.

The affine transformation will end up adding the vector y to any point we
apply it to, so y achieves translation for us. Rotation is stored in the matrix A.
Because it is for us convenient to keep them separate, we will use the first form
more often. So in three dimensions, translation will be stored as a 3-vector t and
rotation as a 3 × 3 matrix, which we will call R.
The following equation, also known as the Rodrigues formula, performs a
general rotation of a point p by θ radians around a rotation axis r̂:

cos θp + [1 − cos θ](r̂ · p)r̂ + sin θ(r̂ × p). (1.3)

This can be represented as a matrix by


⎡ ⎤
tx2 + c txy − sz txz + sy
Rr̂θ = ⎣ txy + sz ty 2 + c tyz − sx ⎦ ,
txz − sy tyz + sx tz 2 + c

where

r̂ = (x, y, z),
c = cos θ,
s = sin θ,
t = 1 − cos θ.

Both translation and rotation are invertible transformations. To invert a trans-


lation, simply add −y. To invert a rotation, take the transpose of the matrix.
One useful property of rotation is its interaction with the cross product:

R(a × b) = Ra × Rb.

Note that this does not hold true for all linear transformations.

i i

i i
i i

i i

1.5. Quaternions 13

1.5 Quaternions
1.5.1 Definition
Another useful rotation representation is the quaternion. In their most general
form, quaternions are an extension of complex numbers. Recall that a complex
number can be represented as
c = a + bi,
where i2 = −1.
We can extend this to a quaternion by creating two more imaginary terms, or

q = w + xi + yj + zk,

where i2 = j 2 = k 2 = ijk = −1. All of a quaternion’s properties follow from


this definition. Since i, j, and k are constant, we can also write this as an ordered
4-tuple, much as we do vectors:

q = (w, x, y, z).

Due to the properties of xi + yj + zk, the imaginary part of a quaternion is


often referred to as a vector in the following notation:

q = (w, v).

Using the vector form makes manipulating quaternions easier for those who are
familiar with vector operations.
Note that most software packages store a quaternion as (x, y, z, w), which
matches the standard layout for vertex positions in graphics.

1.5.2 Basic Operations


Like vectors, quaternions can be scaled and added, as follows:

aq = (aw, av),
q0 + q1 = (w0 + w1 , q0 + q1 ).

There is only one quaternion multiplication operation. In vector form, this is


represented as

q0 q1 = (w0 w1 − v0 · v1 , w0 v1 + w1 v0 + v0 × v1 ).

Note that due to the cross product, quaternion multiplication is noncommutative.

i i

i i
i i

i i

14 1. Mathematical Background

Quaternions, like vectors, have a magnitude:


 
q = w2 + v · v = w2 + x2 + y 2 + z 2 .

Quaternions of magnitude 1, or unit quaternions, have properties that make them


useful for representing rotations.
Like matrices, quaternions have a multiplicative identity, which is (1, 0). There
is also the notion of a multiplicative inverse. For a unit quaternion (w, v), the in-
verse is equal to (w, −v). We can think of this as rotating around the opposing
axis to produce the opposite rotation. In general, the quaternion inverse is

1
q−1 = (w, −v).
w2 + x2 + y2 + z 2

1.5.3 Vector Rotation


If we consider a rotation of angle θ around an axis r, we can write this as a
quaternion:
q = (cos(θ/2), sin(θ/2)r̂).
It can be shown that this is, in fact, a unit quaternion.
We can use a quaternion of this form to rotate a vector p around r̂ by θ by
using the formulation
prot = qpq−1 .
Note that in order to perform this multiplication, we need to rewrite p as a quater-
nion with a zero-valued w term, or (0, p).
This multiplication can be expanded out and simplified as

prot = cos θp + [1 − cos θ](r̂ · p)r̂ + sin θ(r̂ × p),

which as we see is the same as Equation (1.3) and demonstrates that quaternions
can be used for rotation.

1.5.4 Matrix Conversion


It is often useful to convert a quaternion to a rotation matrix, e.g., so it can be
used with the graphics pipeline. Again, assuming a unit rotation quaternion, the
following is the corresponding matrix:
⎡ ⎤
1 − 2y 2 − 2z 2 2xy − 2wz 2xz + 2wy
Rq = ⎣ 2xy + 2wz 1 − 2x2 − 2z 2 2yz − 2wx ⎦ .
2xz − 2wy 2yz + 2wx 1 − 2x2 − 2y 2

i i

i i
i i

i i

1.6. Rigid-Body Dynamics 15

Figure 1.5. Space curve with position and velocity at time t.

1.6 Rigid-Body Dynamics


1.6.1 Constant Forces
Suppose we have an object in motion in space. For the moment, we will consider
only a particle with position x, or linear motion. If we track this position over
time, we end up with a function x(t). In addition, we can consider at a particular
time how fast the object is moving and in what direction. This is the velocity
v(t). As the velocity describes how x changes in time, it is also the derivative of
its position, or ẋ. (See Figure 1.5.)
Assuming that the velocity v is constant, we can create a formula for com-
puting the future position of an object from its current position x0 and the time
traveled t:
x(t) = x0 + vt.
However, most of the time, velocity is not constant, and we need to consider its
derivative, or acceleration a. Assuming a is constant, we can create a similar
formula for v(t):
v(t) = v0 + at.
Since velocity is changing at a linear rate, we can substitute the average of the
velocities across our time steps for v in our original equation:
 
1
x(t) = x0 + t (v0 + v(t))
2
 
1
= x0 + t (v0 + v0 + at)
2
1
= x0 + v0 t + at2 . (1.4)
2
Acceleration in turn is derived from a vector quantity known as a force F.
Forces act to push and pull an object around in space. We determine the acceler-
ation from force by using Newton’s second law of motion,

F = ma,

where m is the mass of the object and is constant.

i i

i i
i i

i i

16 1. Mathematical Background

The standard example of a force is gravity, Fgrav = mg, which draws us to


the Earth. There is also the normal force that counteracts gravity and keeps us
from sinking through the ground. The thrust of a rocket, an engine moving a car
along—these are all forces.
There can be multiple forces acting on an object. To manage these, we take
the sum of all forces on an object and treat the result as a single force in our
equations: 
F= Fj .
j

1.6.2 Nonconstant Forces


Equation (1.4) is suitable when our forces are constant across the time interval we
are considering. However, in many cases, our forces are dependent on position or
velocity. For example, we can represent a spring force based on position,

Fspring = −kx,

or a drag force based on velocity,

Fdrag = −mρv.

And as position and velocity will be changing across our time interval, our forces
will as well.
One solution is to try and find a closed analytical solution, but (a) such a
solution may not be possible to find and (b) the solution may be so complex
that it is impractical to compute every frame. In addition, this constrains us to a
single set of forces for that solution, and we would like the flexibility to apply and
remove forces at will.
Instead, we will use a numerical solution. The problem we are trying to solve
is this: we have a physical simulation with a total force dependent generally on
time, position, and velocity, which we will represent as F(t, x, v). We have a
position x(t) = x0 and a starting velocity v(t) = v0 . The question is, what is
x(t + h)?
One solution to this problem is to look at the definition of a derivative. Recall
that
x(t + h) − x(t)
x (t) = lim .
h→0 h
For the moment, we will assume that h is sufficiently small and obtain an approx-
imation by treating h as our time step.
Rearranging terms, we get
.
x(t + h) = x(t) + hx (t),

i i

i i
i i

i i

1.6. Rigid-Body Dynamics 17

or
.
x(t + h) = x(t) + hv(t).
This is known as the explicit Euler’s method. Another way of thinking of this
is that the derivative is tangent to the curve of x(t) at time t. By taking a small
enough step in the tangent direction, we should end up close to the actual solution.
Note that since we are taking a new time step each frame, the frame positions
are often represented in terms of a sequence of approximations x0 , x1 , x2 , . . . So
an alternative form for Euler’s method is

xi+1 = xi + hxi .

Including the update for velocity, our full set of simulation equations is

vi+1 = vi + hF(ti , xi , vi )/m,


xi+1 = xi + hvi+1 .

Note that we use the result of the velocity step in our position equation. This is
a variant of the standard Euler known as symplectic Euler, which provides more
stability for position-based forces. We will discuss symplectic Euler and other
integration methods below in more detail.

1.6.3 Updating Orientation


Updating orientation for a rigid-body simulation is similar to, yet different from,
updating position. In addition to the linear quantities, we now have an object with
the last frame’s orientation Ri or qi , the last frame’s angular velocity vector ωi ,
an inertial tensor I, and a sum of torques τ . From that, we wish to calculate the
current frame’s orientation Ri+1 or qi+1 and the current frame’s angular velocity
ωi+1 .
The orientation itself we represent with either a rotation matrix R or a quater-
nion q, both encapsulating rotation from a reference orientation (much as we can
use a vector from the origin to represent a point). Which form we use depends
on our needs. For example, rotation matrices can be convenient because they are
easily converted into a form efficient for rendering. However, quaternions take up
less space and need fewer operations to update and, thus, can be more efficient in
the simulation engine itself.
Angular velocity is the rotational correspondence to linear velocity. As lin-
ear velocity represents a change in position, angular velocity represents a change
in orientation. Its form is a three-element vector pointing along the axis of ro-
tation and scaled so that its magnitude is the angle of rotation, in radians. We

i i

i i
i i

i i

18 1. Mathematical Background

Figure 1.6. Converting between angular and linear velocities.

can determine the linear velocity at a displacement r from the center of rotation
(Figure 1.6) using the following equation:
v = ω × r. (1.5)
If the object is also moving with a linear velocity vl , this becomes
v = vl + ω × r.

The inertial tensor I is the rotational equivalent to mass. Rather than the single
scalar value of mass, the inertial tensor is a 3 × 3 matrix. This is because the
shape and density of an object affects how it rotates. For example, consider a
skater doing a spin. If she draws her arms in, her angular velocity increases. So
by changing her shape, she is changing her rotational dynamics.
Computing the inertial tensor for an object is not always easy. Often, we can
approximate it by using the inertial tensor for a simpler shape. For example, we
could use a box to approximate a car or a cylinder to approximate a statue. If
we want a more accurate representation, we can assume a constant density object
and compute it based on the tessellated geometry. One way to think of this is as
the sum of tetrahedra, where each tetrahedron shares a common vertex with the
others, and the other vertices are one face of the original geometry. As the inertial
tensor for a tetrahedron is a known quantity, this is a relatively straightforward
calculation [Kallay 06]. A quantity that has no linear complement is the center
of mass. This is a point, relative to the object, where applying a force invokes
no rotation. We can think of this as the perfect balance point. The placement of
the center of mass varies with the density or shape of an object. So a uniformly
dense and symmetric steel bar will have its center of mass at its geometric cen-
ter, whereas a hammer, for example, has its center of mass closer to its head.
Placement of the center of mass can be done in a data-driven way by artists or
designers, but more often, it comes out of the same calculation that computes the
inertial tensor.

i i

i i
i i

i i

1.6. Rigid-Body Dynamics 19

The final quantity is torque, which is the rotational equivalent to force. Ap-
plying force to an object at any place other than its center of mass will generate
torque. To compute the torque, we take a vector r from the center of mass to the
point where the force is applied and perform a cross product as follows:

τ = r × F.

This will apply the torque counterclockwise around the vector direction, as per
the right-hand rule. We can sum all torques to determine the total torque on an
object: 
τtot = rj × Fj .
j

As with force, we can use Newton’s second law to find the relationship be-
tween torque and angular acceleration α:

τ = Iα.

1.6.4 Numerical Integration for Orientation Using Matrices


To update our orientation, we ideally would want to do something like this:

Ri+1 = Ri + hωi .

However, as Ri is a matrix and ωi is a vector, this is not possible. Instead, we do


the following:
Ri+1 = Ri + h[ω]× i Ri ,
where ⎡ ⎤
0 −ω3 ω2
[ω]× = ⎣ ω3 0 −ω1 ⎦ .
−ω2 ω1 0
To understand why, let us consider the basis vectors of the rotation matrix R and
how they change when an infinitesimal angular velocity is applied. For simplic-
ity’s sake, let us assume that the angular velocity is applied along one of the basis
vectors; Figure 1.7 shows the other two. Recall that the derivative is a linear
quantity, whereas angular velocity is a rotational quantity. What we need to do is
change the rotational change of each axis to a linear change. We can do this by
computing the infinitesimal linear velocity at the tip of a given basic vector and
then adding this to get the new basis vector.
Recall that Equation (1.5) gives the linear velocity at a displacement r for
angular velocity ω. So for each basis vector rj , we could compute ω × rj and,

i i

i i
i i

i i

20 1. Mathematical Background

Figure 1.7. Change in basis vectors due to angular velocity.

from that, create a differential rotation matrix. However, there is another way to
do a cross product and that is to use a skew symmetric matrix of the appropriate
form, which is just what [ω]× is. Multiplying rj by the skew symmetric matrix
[ω]× will perform the cross product ω × rj , and multiplying R by [ω]× will
perform the cross product on all the basis vectors as a single operation, giving us
our desired result of dR/dt.

1.6.5 Numerical Integration for Orientation Using


Quaternions
Performing the Euler step for quaternions is similar to matrices. Again, we use an
equation that can turn our angular velocity vector into a form suitable for adding
to a quaternion:
h
qi+1 = q + wq,
2
where w is a quaternion of the form

w = (0, ω).

There are a number of proofs for this, though none are as intuitive as the one
for rotation matrices. The most straightforward is from [Hanson 06]. If we take a
quaternion q to the t power, we find that

qt = exp(t log q).

For a rotation quaternion,  


θ
log q = 0, r̂ ,
2
and hence,
 
θ
exp(t log q) = exp 0, t r̂
2
 
tθ tθ
= cos , sin r̂ .
2 2

i i

i i
i i

i i

1.6. Rigid-Body Dynamics 21

Taking the derivative of qt with respect to t gives us

dqt d exp(t log q)


=
dt dt
= log q exp(t log q)
= (log q)qt .

At t = 0, this is just
dq
= log q
dt  
θ
= 0, r̂ .
2
1
Pulling out the 2 term, we get

1 1
(0, θr̂) = w.
2 2
Multiplying this quantity by the quaternion q gives the change relative to q, just
as it did for matrices.

1.6.6 Numerical Integration for Angular Velocity


As angular velocity and torque/angular acceleration are both vectors, we might
think we could perform the followoing:

ωi+1 = ωi + hI−1 τ.

However, as
τ = Iω̇ + ω × Iω,
we cannot simply multiply τ by the inverse of I and do the Euler step.
One solution is to ignore the ω ×Iω term and perform the Euler step as written
anyway. This term represents the precession of the system—for example, a tipped,
spinning top will spin about its local axis but will also slowly precess around its
vertical axis as well. Removing this term will not be strictly accurate but can add
some stability.
The alternative is to do the integration in a different way. Consider the angular
momentum L instead, which is Iω. The derivative L̇ = Iω̇ = Iα = τ . Hence we
can do the following:

Li+1 = Li + hτ,
ωi+1 = I−1
i Li+1 .

i i

i i
i i

i i

22 1. Mathematical Background

The final piece is the calculation of I−1


i . The problem is that I is calculated
relative to the object, but the remaining quantities are computed relative to the
world. The solution is to update I each time step based on its current orientation
thusly:
I−1 −1 −1
i Li+1 = Ri I0 Ri Li+1 .

We can think of this as rotating the angular momentum vector into the object’s
local orientation, applying the inverse inertial tensor, and then rotating back into
world coordinates.
This gives us our final formulas:

τ = rk × Fk ,
k

Li+1 = Li + hτ,
I−1
i = Ri I−1 −1
0 Ri ,

ωi+1 = I−1
i Li+1 ,

Ri+1 = Ri + hωi+1 .

1.7 Numerical Integration


1.7.1 Issues with Euler’s Method
Euler’s method has the advantage of simplicity, however, it has its problems. First
of all, it assumes that the derivative at the current point is a good estimate of
the derivative across the entire interval. Secondly, the approximation that Euler’s
method produces adds energy to the system. And this approximation error is
propagated with each Euler step. This leads to problems with stability if our
system oscillates, such as with springs, orbits, and pendulums, or if our time step
is large. In either case, the end result is that our approximation becomes less and
less accurate.
We can see an example of this by looking at Euler’s method used to simulate
an orbiting object (Figure 1.8). The first time step clearly takes us off the desired
path, and each successive step only makes things worse. We see similar problems
with so-called “stiff” equations, e.g., those used to simulate stiff springs (hence
the name).
Recall that the definition of the derivative assumes that h is infinitesimally
small. So one solution might be to decrease our time step: e.g., divide our time in
half and take two steps. While this can help in some situations (and some physics

i i

i i
i i

i i

1.7. Numerical Integration 23

Figure 1.8. Using Euler’s method to approximate an orbit.

engines do just that for that reason), because of the nature of Euler’s method the
error will still accumulate.

1.7.2 Higher-Order Explicit Methods


One solution to this problem is to realize that we are trying to approximate a non-
linear function with a linear function. If we take a weighted average of samples
of the derivative across our interval, perhaps we can construct a better approxi-
mation. The higher-order Runge-Kutta methods do just this. The most notable
example is Runge-Kutta Order 4, or just RK4, which takes four samples of the
derivative.
In general, RK4 will provide a better approximation of the function. However,
it does come with the cost of more invocations of the derivative function, which
may be expensive. In addition, it still does not solve our problem with stiff equa-
tions. For particularly stiff equations, RK4 will still add enough energy into the
system to cause it to spiral out of control. Fortunately, there are other possibilities.

1.7.3 Implicit Methods


One method uses an alternative definition of the derivative:
x(t) − x(t − h)
x (t) = lim .
h→0 h
If we assume small h and again rearrange terms, we get
.
x(t) = x(t − h) + hx (t).

Substituting t + h for t, we end up with


.
x(t + h) = x(t) + hx (t + h).

i i

i i
i i

i i

24 1. Mathematical Background

This is known as the implicit Euler method. The distinction between the implicit
and explicit methods is that with the implicit methods, the right side includes
terms that are not yet known. Implicit Euler is a first-order implicit method—it is
possible to create higher-order methods just as we did for explicit methods.
Whereas explicit methods add energy to the system as they drift away from
the actual function, implicit methods remove energy from the system. So while
implicit methods still do not handle oscillating or stiff equations perfectly, they
do not end up oscillating out of control. Instead, the system will damp down
much faster than expected. The solution converges, which is not ideal, but does
maintain stability.
We do have the problem that x (t + h) is unknown. There are three possible
ways to solve this. One is to try to solve for an analytic solution. However,
as before, this is not always possible, and often we do not have an equation for
x (t)—it is a function we call in our simulator that returns a numeric result. That
result could be computed from any number of combinations of other equations.
So, for both reasons, it is usually not practical to compute an explicit solution. In
this case, we have two choices.
The first is to compute x(t + h) using an explicit method and then use the
result to compute our implicit function. This is known as a predictor-corrector
method, as we predict a solution using the explicit equation and then correct for
errors using the implicit solution. An example of this is using the result of an
explicit Euler step in a modified implicit Euler solution:

x̃i+1 = xi + hvi ,
ṽi+1 = vi + hF(ti , xi , vi )/m,

h
xi+1 = xi + (ṽi+1 + vi ),
2
h
vi+1 = vi + (F(t̃i+1 , x̃i+1 , ṽi+1 ) + F(ti , xi , vi ))/m.
2
An alternative method for implicit Euler is to treat it as a linear equation and
solve for it. We can do this for a force dependent on position as follows:

xx+1 = xi + hi xi+1 ,
xi + Δxi = xi + hi F(xi + Δxi ),
Δxi = hi F(xi + Δxi ),
Δxi ≈ hi (F(xi ) + J(xi )Δxi ),
 −1
1
Δxi ≈ E − J(xi ) F(xi ),
hi

i i

i i
i i

i i

1.7. Numerical Integration 25

where J is a matrix of partial derivatives known as the Jacobian. The resulting


matrix is sparse and easy to invert, which makes it useful for large systems, such
as collections of particles.

1.7.4 Verlet Methods


A popular game physics method, mainly due to [Jakobsen 01], is Verlet integra-
tion. In its most basic form, it is a velocity-less scheme, instead using the position
from the previous frame. As we often don’t care about the velocity of particles,
this makes it very useful for particle systems.
The general formula for the Verlet method is as follows:

xi+1 = 2xi − xi−1 + h2 ai .

While standard Verlet is quite stable, it has the disadvantage that it doesn’t
incorporate velocity. This makes it difficult to use with velocity-dependent forces.
One possible solution is to use Leapfrog Verlet:

vi+1/2 = vi−1/2 + hai ,


xi+1 = xi + hvi+1/2 .

However, this does not compute the velocity at the current time step, but in-
stead at the half-time step (this is initialized by using a half-interval Euler step).
While we can take an average of these over two time steps for our force calcula-
tion, we still have problems with impulse-based collision systems, which instan-
taneously modify velocity to simulate contact forces. One solution to this is use
the full velocity Verlet:

vi+1/2 = vi + h/2ai ,
xi+1 = xi + hvi+1/2 ,
vi+1 = vi+1/2 + h/2ai+1 .

However, unlike Euler’s method, this does require two force calculations, and we
can get similar stability with the last method we’ll consider.
More information on Verlet methods can be found in Chapter 11.

1.7.5 Symplectic Euler Method


We’ve already seen the symplectic Euler method previously—in fact, it’s the
method we were using for the simulation equations in Section 1.6. It is a semi-
implicit method, in that it uses the explicit Euler method to update velocity but

i i

i i
i i

i i

26 1. Mathematical Background

Figure 1.9. Using the symplectic Euler method to approximate an orbit.

uses an implicit value of velocity to update position:

vi+1 = vi + hF(ti , xi , vi )/m,


xi+1 = xi + hvi+1 .

This takes advantage of the fact that velocity is the derivative of position, and
the end result is that we get a very stable method that only requires one force
calculation. It does have the disadvantage that it is not as accurate with constant
forces, but in those cases, we should consider using Equation (1.4) anyway.
In Figure 1.9, we see the result of using symplectic Euler with one step of our
orbit example. Admittedly this is a bit contrived, but we see that, in principle,
it is extremely stable—neither spiraling outward as explicit Euler would do nor
spiraling inward as implicit Euler would do.

1.8 Further Reading


This chapter is mainly intended as an overview, and the interested reader can find
more details in a wide variety of sources. Good references for linear algebra with
widely varying but useful approaches are [Anton and Rorres 94] and [Axler 97].
Kenneth Joy also has a good series on vectors, points, and affine transformations,
found in [Joy 00c], [Joy 00b], and [Joy 00a].
The standard quaternion reference for graphics is [Shoemake 85], which has
been expanded to excellent detail in [Hanson 06]. An early series of articles
about game physics is [Hecker 97], and [Witkin and Baraff 01] provides thorough
coverage of the early Pixar physics engine. It is also worth mentioning [Catto 06],
which first introduced me to the symplectic Euler method, for which I am eternally
grateful.
Finally, without modesty, a good general source for all of these topics is my
own work, cowritten with Lars Bishop [Van Verth and Bishop 08].

i i

i i
i i

i i

Bibliography 27

Bibliography
[Anton and Rorres 94] Howard Anton and Chris Rorres. Elementary Linear Al-
gebra: Applications Version, Seventh edition. New York: John Wiley and
Sons, 1994.

[Axler 97] Sheldon Axler. Linear Algebra Done Right, Second edition. New
York: Springer, 1997.

[Catto 06] Erin Catto. “Fast and Simple Physics using Sequential Impulses.”
Paper presented at GDC 2006 Tutorial “Physics for Game Programmers,”
San Jose, CA, March, 2006.

[Golub and Van Loan 93] Gene H. Golub and Charles F. Van Loan. Matrix Com-
putations. Baltimore, MD: Johns Hopkins University Press, 1993.

[Hanson 06] Andrew Hanson. Visualizing Quaternions. San Francisco: Morgan


Kaufmann, 2006.

[Hecker 97] Chris Hecker. “Behind the Screen: Physics.” Series published in
Game Developer Magazine, 1996–1997.

[Jakobsen 01] Thomas Jakobsen. “Advanced Character Physics.” Paper pre-


sented at Game Developers Conference 2001, San Jose, CA, March, 2001.

[Joy 00a] Kenneth Joy. “On-Line Geometric Modeling Notes: Affine Combina-
tions, Barycentric Coordinates and Convex Combinations.” Technical re-
port, University of California, Davis, 2000.

[Joy 00b] Kenneth Joy. “On-Line Geometric Modeling Notes: Points and Vec-
tors.” Technical report, University of California, Davis, 2000.

[Joy 00c] Kenneth Joy. “On-Line Geometric Modeling Notes: Vector Spaces.”
Technical report, University of California, Davis, 2000.

[Kallay 06] Michael Kallay. “Computing the Moment of Inertia of a Solid De-
fined by a Triangle Mesh.” journal of graphics tools 11:2 (2006), 51–57.

[Press et al. 93] William H. Press, Brian P. Flannery, Saul A. Teukolsky, and
William T. Vetterling. Numerical Recipes in C: The Art of Scientific Com-
puting, Second edition. New York: Cambridge University Press, 1993.

[Shoemake 85] Ken Shoemake. “Animating Rotation with Quaternion Curves.”


Computer Graphics (SIGGRAPH ’85 Proceedings) 19 (1985), 245–254.

i i

i i
i i

i i

28 Bibliography

[Van Verth and Bishop 08] James M. Van Verth and Lars M. Bishop. Essential
Mathematics for Games and Interactive Applications, Second edition. San
Francisco: Morgan Kaufmann, 2008.

[Witkin and Baraff 01] Andrew Witkin and David Baraff. “Physically Based
Modelling: Principles and Practice.” ACM SIGGRAPH 2001 Course Notes.
Available at http://www.pixar.com/companyinfo/research/pbm2001/, 2001.

i i

i i
i i

i i

-2-
Understanding
Game Physics Artifacts
Dennis Gustafsson

2.1 Introduction
Physics engines are known for being notoriously hard to debug. For most people,
physics artifacts are just a seemingly random stream of weird behavior that makes
no sense. Few components of a game engine cause much frustration and hair
loss. We have all seen ragdolls doing the funky monkey dance and stacks of
“rigid” bodies acting more like a tower of greasy mushrooms, eventually falling
over or taking off into the stratosphere. This chapter will help you understand the
underlying causes of this behavior and common mistakes that lead to it. Some of
them can be fixed, some of them can be worked around, and some of them we will
just have to live with for now. This is mostly written for people writing a physics
engine of their own, but understanding the underlying mechanisms is helpful even
if you are using an off-the-shelf product.

2.2 Discretization and Linearization


Physics engines advance time in discrete steps, typically about 17 ms for a 60 Hz
update frequency. It is not uncommon to split up the time step into smaller steps,
say two or three updates per frame (often called substepping) or even more, but no
matter how small of a time step you use, it will still be a discretization of a con-
tinuous problem. Real-world physics do not move in steps, not even small steps,
but in a continuous motion. This is by far the number one source for physics ar-
tifacts, and any properly implemented physics engine should behave better with
more substeps. If a physics artifact does not go away with more substeps, there
is most likely something wrong with your code. The bullet-through-paper prob-
lem illustrated in Figure 2.1 is a typical example of a problem that is caused by
discretization.

29

i i

i i
i i

i i

30 2. Understanding Game Physics Artifacts

1 2 3

Figure 2.1. Discretization can cause fast-moving objects to travel through walls.

Another big source of artifacts is the linearization that most physics engines
employ—the assumption that during the time step everything travels in a linear
motion. For particle physics, this is a pretty good approximation, but as soon as
you introduce rigid bodies and rotation, it falls flat to the ground. Consider the ball
joint illustrated in Figure 2.2. The two bodies are rotating in opposite directions.
At this particular point in time, the two bodies are lined up as shown. Even if
the solver manages to entirely solve the relative velocity at the joint-attachment
point to zero, as soon as time is advanced, no matter how small the amount, the
two attachment points will drift apart. This is the fundamental of linearization,
which makes it impossible to create an accurate physics engine by solving just for
relative linear velocities at discrete points in time.
Even though linearization and discretization are two different approximations,
they are somewhat interconnected. Lowering the step size (increasing the number
of substeps) will always make linearization less problematic, since any nonlinear
motion will appear more and more linear the shorter the time span. The ambitious
reader can make a parallel here to the Heisenberg principle of uncertainty!
The major takeaway here is that as long as a physics engine employs dis-
cretization and linearization, which all modern physics engines and all algorithms

1 2

Figure 2.2. Even if relative linear velocity at the joint attachment is zero, objects can
separate during integration due to rotation.

i i

i i
i i

i i

2.3. Time Stepping and the Well of Despair 31

and examples in this book do, there will always be artifacts. These artifacts are
not results of a problem with the physics engine itself, but the assumptions and
approximations the engine is built upon. This is important to realize, because
once you accept the artifacts and understand their underlying causes, it makes
them easier to deal with and work around.

2.3 Time Stepping and the Well of Despair


Since the physics engine is advanced in discrete steps, what happens if the game
drops a frame? This is a common source of confusion when integrating a physics
engine, since you probably want the motion in your game to be independent of
frame rate. On a slow machine, or in the occasion of your modern operating sys-
tem going off to index the quicksearch database in the middle of a mission, the
graphical update might not keep up with the desired frequency. There are several
different strategies for how to handle such a scenario from a physics perspective.
You can ignore the fact that a frame was dropped and keep stepping the normal
step length, which will create a slow-motion effect that is usually highly unde-
sirable. Another option is to take a larger time step, which will create a more
realistic path of motion but may introduce jerkiness due to the variation in dis-
cretization. The third option is to take several, equally sized physics steps. This
option is more desirable, as it avoids the slowdown while still doing fixed-size
time steps.

2.3.1 The Well of Despair


Making several physics updates per frame usually works fine, unless the physics
is what is causing the slowdown to begin with. If physics is the bottleneck, the
update frequency will go into the well of despair, meaning every subsequent frame
needs more physics updates, causing a slower update frequency, resulting in even
more physics updates the next frame, and so on. There is unfortunately no way
to solve this problem other than to optimize the physics engine or simplify the
problem, so what most people do is put a cap on the number of physics updates
per frame, above which the simulation will simply run in slow motion. Actually,
it will not only run in slow motion but it will run in slow motion at a lower-than-
necessary frame rate, since most of what the physics engine computes is never
even shown! A more sophisticated solution is to measure the time of the physics
update, compare it to the overall frame time, and only make subsequent steps
if we can avoid the well of despair. This problem is not trivial, and there is no
ultimate solution that works for all scenarios, but it is well worth experimenting
with since it can have a very significant impact on the overall update frequency.

i i

i i
i i

i i

32 2. Understanding Game Physics Artifacts

2.4 The Curse of Rotations


Since rotation is the mother of most linearization problems, it deserves some spe-
cial attention. One fun experiment we can try is to make the inertia tensor for
all objects infinite and see how that affects our simulation. The inertia tensor can
roughly be described as an object’s willingness to rotate and is often specified
as its inverse, so setting all values to zero typically means rotations will be com-
pletely disabled. You will be surprised how stable those stacks become and how
nicely most scenarios just work. Unfortunately, asking the producer if it is okay
to skip rotations will most likely not be a good idea, but what we can learn is that
the more inertia we add, the less rotation will occur, problems with linearization
will decrease, and the simulation will get more stable.
The problem is especially relevant on long, thin rods. So if you experience
instability with such objects, try increasing the inertia, especially on the axis along
the rod (compute inertia as if the rod was thicker). Increasing inertia will make
objects look heavy and add a perceived slow-motion effect, so you might want to
take it easy, but it can be a lifesaver and is surprisingly hard to spot.

2.5 Solver
Just to freshen up our memory without getting into technical detail, the solver
is responsible for computing the next valid state of a dynamic system, taking
into account various constraints. Now, since games need to be responsive, this
computation has to be fast, and the most popular way of doing that is using an
iterative method called sequential impulse. The concept is really simple: given a
handful of constraints, satisfy each one of them, one at a time, and when the last
one is done, start over again from the beginning and do another round until it is
“good enough,” where good enough often means, “Sorry man, we are out of time,
let’s just leave it here.”
What is really interesting, from a debugging perspective, is how this early
termination of a sequential impulse solver can affect the energy of the system.
Stopping before we are done will not add energy to the system, it will drain en-
ergy. This means it is really hard to blame the solver itself for energy being added
to the system.
When you implement a sequential impulse solver with early termination,
stacked, resting objects tend to sink into each other. Let’s investigate why this is
happening: at each frame, gravity causes an acceleration that increases an object’s
downward velocity. Contact generation creates a set of points and at each contact,
the solver tries to maintain a zero-relative velocity. However, since greedy game

i i

i i
i i

i i

2.5. Solver 33

programmers want CPU cycles for other things, the solver is terminated before it
is completely done, leaving the objects with a slight downward velocity instead of
zero, which is desired for resting contact. This slight downward velocity causes
objects to sink in, and the process is repeated.
To compensate for this behavior, most physics engines use a geometric mea-
sure for each contact point: either penetration depth or separation distance. As
the penetration depth increases, the desired resulting velocity is biased, so that it
is not zero but is actually negative, causing the objects to separate. This translates
to objects being soft instead of rigid, where the softness is defined by how well
the solver managed to solve the problem. This is why most solvers act springy
or squishy when using fewer iterations. Hence, the best way to get rid of the
mushroom is to increase the number of iterations in the solver!

2.5.1 Keeping the Configuration Unchanged


A solver that uses this kind of geometric compensation running at the same step
size and same number of iterations every frame will eventually find an equilib-
rium after a certain number of frames. Understanding that this equilibrium is not
a relaxed state but a very complex ongoing struggle between gravity, penetrat-
ing contacts, and penalty forces is key to stability. Removing or adding even a
single constraint, or changing the number of iterations, will cause the solver to
redistribute the weight and find a new equilibrium, which is a process that usu-
ally takes several frames and causes objects to wiggle. The set of constraints for
a specific scenario is sometimes called its configuration; hence keeping the con-
figuration unchanged from one frame to the next is very important, and we will
revisit this goal throughout the chapter.

2.5.2 Warm Starting


Assuming that the configuration does not change and objects are at rest, the im-
pulses at each contact point will be essentially the same every frame. It seems
kind of unnecessary to recompute the same problem over and over again. This is
where warm starting comes into the picture. Instead of recomputing the impulses
from scratch every time, we can start off with the impulses from the previous
frame and use our solver iterations to refine them instead. Using warm starting
is almost always a good idea. The downside is that we have to remember the
impulses from the last frame, which requires some extra bookkeeping. However,
since most physics engines keep track of pairs anyway, this can usually be added
relatively easily.
I mentioned before that a sequential impulse solver does not add energy but
rather drains energy from a system. This unfortunately no longer holds true if

i i

i i
i i

i i

34 2. Understanding Game Physics Artifacts

1 2 3 4

Figure 2.3. A sequential impulse solver can cause an aligned box falling flat to the ground
to bounce off with rotation.

warm starting is being used. Full warm starting can give a springy, oscillating
behavior and prevents stacks from ever falling asleep. Because of this, the cur-
rent frame’s impulses are usually initialized with only a fraction of the previous
frame’s impulses. As we increase this fraction, the solver becomes more springy,
but it can also handle stacking better. It could be worth experimenting with this
to find the sweet spot.

2.5.3 Who Is Tilting My Box


A sequential impulse solver, as described above, is called in mathematical terms
Gauss-Seidel iteration. Another method is Jacobi iteration, in which all contact
points are solved independently, and then the resulting impulses are applied all at
once, hence removing the sequential in sequential impulse. Jacobi solvers have
some nice properties, especially when it comes to parallelization, but they gener-
ally take way more iterations to converge. One effect of sequential contact solving
is that symmetric problems often have seemingly unpredictable solutions. Con-
sider a perfectly aligned box dropped on a horizontal plane. All four corners hit
the plane at the same time, even forming four identical contact points. A sequen-
tial impulse solver will start solving one contact point without considering the
other three, apply the resulting impulse and then consider the next one. While
solving the second contact, the problem is no longer symmetric, since the box is
rotating after applying the first impulse. The resulting motion will behave as if
one corner of the box hit the ground slightly before the others (see Figure 2.3).
Hence, whenever we see this type of behavior, it is most likely not an error, just
brother Gauss-Seidel pulling a prank.

2.5.4 Friction
Friction is usually a little trickier than nonpenetration constraints since the max-
imum applied force depends on the normal force. The more pressure there is on
an object, the better it sticks. This interdependence results in a nonlinear problem
that is very tricky to solve accurately.

i i

i i
i i

i i

2.5. Solver 35

Coupled or decoupled. There are two main approaches to solving friction—


coupled and decoupled. In the coupled approach, the maximum friction force
changes while iterating, basically trying to solve a nonlinear problem with a tool-
box that is designed for linear problems (Gauss-Seidel), which may sound inap-
propriate but actually works fairly well in practice. The decoupled involves using
a fixed maximum friction force that is determined before iterating. In the case of
decoupled friction, there are two popular methods: either using the normal force
from the last time step, which requires some bookkeeping, or using a fixed value,
regardless of normal force. Such a fixed value is often based on the normal force
to keep the body at rest when affected by gravity. This may sound like a very
crude approximation, but it works surprisingly well, requires no bookkeeping,
and is perfectly linear. The main drawback is, of course, that friction is unaf-
fected by how much pressure is on the object. An object at the bottom of a stack
slides out just as easily as the ones on top!

Friction in stacks. It is worth mentioning the importance of proper friction for


handling stable stacking. Even in a scenario that seems largely unaffected by
friction, like a pyramid of boxes, friction plays a very important role. Remember
that the solver causes objects to rotate as an artifact of Gauss-Seidel iteration.
This rotation introduces a tangential motion that causes a stack to tip over if no
friction is used.

Friction drift. Remember the description above, about early solver termination
causing stacked objects to sink into each other? The exact same thing happens to
friction constraints, so if not compensated for, stacked objects might slide around
slowly on top of each other, eventually causing the whole thing to fall over. Track-
ing friction drift is cumbersome because it involves tracking pairs of objects over
several frames. For penetration depth it is rather straightforward since the desired
configuration is determined by the shape of the objects. For static friction, it is
not quite that easy. Static friction can be seen as a temporary joint holding two
objects together in the contact plane. If the maximum joint force is exceeded,
the objects should actually slide, but as long as the force is within the maximum
friction force, the relative net motion should ideally be zero. Hence, any motion
that actually occurs is due to early solver termination, linearization, or any other
of our artifact friends. Measuring this drift and compensating for it over time can
therefore help maintain stable stacking and natural friction behavior.

2.5.5 Shock Propagation


As a way to counteract the squishiness of iterative solvers, a shock-propagation
scheme can be used. The idea is to analyze the configuration and set up the

i i

i i
i i

i i

36 2. Understanding Game Physics Artifacts

problem in such a way so that the solver can find a solution more quickly. Some
engines maintain an explicit graph of how the objects connect, whereas other en-
gines temporarily tweak mass ratios, inertia, or gravity. There is a lot of creativity
in shock propagation, but the artifacts are usually similar.
Large stacks require many iterations because the impulses at the bottom of the
stack are many times bigger than they would be for any pair of objects solved in
isolation. It takes many iterations to build up these large impulses. With shock
propagation, objects at the bottom of a stack will not feel the entire weight of the
objects on top. This can show up as the unnatural behavior of stacks tipping over
and can also be very obvious when observing friction—an object at the bottom of
a stack can be as easily pulled out as one on top.

2.6 Collision Detection


The collision-detection problem is often broken down into two or three phases.
First a broad phase, detecting objects in close proximity, and then sometimes a
mid phase, breaking down structures into smaller parts, before the near phase,
computing the actual contact points.

2.6.1 Phases
Broad phase. Let us start with the broad phase, which has a relatively well-
defined task: report overlaps of bounding volumes, most often axis-aligned bound-
ing boxes. If the bounding box is too small, we might experience weird shootouts
as the broad phase reports nonoverlap until the objects are already in penetration.
Having the bounding boxes too big, on the other hand, has a performance impli-
cation, so we have to be sure to make them just right. Remember that if we use
continuous collision detection or intentional separation distance, these must be
included in the bounding-box computation, so that the bounding box is no longer
tight-fitting around the object. These errors can be hard to spot since it looks right
most of the time.

Mid phase. The mid phase often consists of a bounding-volume hierarchy to


find convex objects in close proximity. Again, incorrect bounding-box compu-
tation can lead to shootouts. Another common problem is that objects can get
stuck in between two convex parts of a compound geometry. Consider the object
consisting of two spheres in Figure 2.4. Convex geometries are usually treated in
isolation, causing two conflicting contact points with opposite normals and pene-
tration depths. Feeding this problem to the solver is a dead end—there is no valid
solution! The objects will start shaking violently and act very unstable. There

i i

i i
i i

i i

2.6. Collision Detection 37

1 2

Figure 2.4. Compound geometries can cause artifacts when objects get stuck in between
parts.

is no good solution to this, but avoid using many small objects to make up com-
pound bodies. In the case above, a capsule or cylinder would have avoided the
problem.

1 2 3

Figure 2.5. An object sliding over a compound geometry can catch on invisible hooks
due to penetration.

Sliding. A similar problem can occur when an object is sliding over a flat surface
that is made up of multiple parts. Imagine the scene in Figure 2.5. The box
should ideally slide over the seam without any glitches, but they way the object
is constructed, the seam can create invisible “hooks” causing the sliding object to

1 2

Figure 2.6. Making a ramp on each side and letting them overlap is a simple work-around
to avoid objects getting stuck in compound objects.

i i

i i
i i

i i

38 2. Understanding Game Physics Artifacts

stop. This is a typical frustrating artifact in certain car racing games where the
car can get trapped on invisible hooks while sliding along the fence. A simple
workaround is to construct the geometry as suggested in Figure 2.6.

Near phase. The near phase is by far the most complex part, where the actual
contact generation occurs. The poor solver is often blamed for unstable and jit-
tering simulations, but surprisingly often, shaking objects, general instability, and
jerkiness can be attributed to inadequate contact generation. A sequential-impulse
solver can be blamed for squishy stacks, improper friction, and many other things,
but it is actually quite hard to make a solver that causes objects to rattle and shake.
Near-phase contact generation often has many special cases and can be prone to
numerical floating-point precision issues. Some engines use contact pruning to
remove excess contact points. Special care should then be taken to make sure the
same contacts are pruned every frame. Remember that keeping the configuration
unchanged is key to stability.

2.6.2 Continuous Collision Detection


Ah, continuous collision detection, a technique that prevents objects from slipping
through walls—how about that! Just enable it, sit back, and enjoy how everything
magically works? Not quite, unfortunately.
Let us start by splitting the problem domain into two categories. First, there
are artifacts caused by discretization, typically, a small object passing through a
wall, called the bullet-through-paper problem already mentioned in the beginning
of this chapter. The other category is when contact is detected and generated,
but the solver fails to find a proper solution, usually because of early termination.
This artifact can be very significant when a light object is getting squished in
between two heavy objects and is sometimes referred to as the sandwich case (see
Figure 2.7).

1 2

Figure 2.7. Fast-moving objects are not the only ones taking shortcuts through walls.
Early solver termination can cause objects to get squished even if contacts are detected and
generated.

i i

i i
i i

i i

2.6. Collision Detection 39

1 2 3 4

Figure 2.8. Fast-moving objects could potentially get rotated through the floor even if a
contact is generated.

There is also a fairly common case that is a combination of the two. Imagine
a thin rod, slightly inclined, falling onto a flat surface, as illustrated in Figure 2.8.
The initial impact on the left side can cause a rotation so severe that by the next
time step, more than half of the rod has already passed through the floor, and con-
tact generation pushes it out on the other side. This example is a good illustration
of the sometimes complex interaction between linearization and discretization
that can bring a seemingly simple case like this to epic failure, even with con-
tinuous collision detection switched on. Note that some physics engine actually
do have a really sophisticated nonlinear continuous collision detection that does
consider rotation as well, in which case the example mentioned above would have
actually worked.

Sandwich case. The sandwich case can be somewhat worked around by priori-
tizing contacts. It is always the last constraints in a sequential impulse solver that
will be the most powerful and less prone to be violated upon early termination.
Therefore, it is best to rearrange the stream of contacts so that the ones that touch
important game-play mechanisms, such as walls, are solved at the very last. A
good common practice to avoid having objects get pushed through walls or the
floor is to solve all contacts involving a static object after any other contact or do
an extra iteration or two after termination to satisfy only static contacts.

Bullet-through-paper. An engine that aims to solve only the bullet-through-


paper case typically uses a raycast or linear sweep operation to find a time of
impact, then either splits up the time step—simulates the first half until the ob-
ject is touching and then does the rest—or employs an early-engage method that
inserts a contact point before the object has actually reached the surface. The
early-engage method can sometimes be noticed as an invisible wall in front of
obstacles, especially when using zero restitution, in which case a falling object
could come to a full stop some distance above the floor before finally falling the
last bit.

i i

i i
i i

i i

40 2. Understanding Game Physics Artifacts

2.7 Joints
Joints are at the most fundamental level simpler than contacts. It is an equality
constraint, keeping the relative velocity between two bodies to zero. No inequal-
ities, no interdependent friction, etc. However, the way we combine constraints,
and add limits, breakable constraints, joint friction, and damping typically make
them fairly complex.

2.7.1 Drift
The most common artifact with joints is drifting, i.e., an unintended separation
between the two jointed objects. It is the joint counterpart to stacked objects
sinking into each other. The solver simply fails to find a valid solution within the
limited number of iterations. However, as described in the introduction to this
chapter, even with an unlimited number of iterations, joints can still drift due to
the linearization of velocities. Most engines cope with drifting in the same way
they cope with penetration or friction drift: simply add a geometric term, acting
as a spring to compensate for the drift.

2.7.2 Solving Direct


A good way to reduce joint drift is to solve as many constraints as possible at
the same time. Since joints are made up of equality constraints, they can be
solved as a system of linear equations, sometimes referred to as a direct solver.
Solving a system of linear equations is more complicated than applying sequential
impulses, but it does pay off in stability. On the upside, these two methods can
be easily combined. Some engines solve systems of three orthogonal constraints
(this particular assembly is found in many joint types) as a special case with a
three-by-three matrix inversion and then interweave the rest of the constraints
using sequential impulses.
The way the constraints are placed also matters when it comes to stability.
Consider a ball joint. It might be tempting to use a single constraint in the direc-
tion of maximum separation or in the direction of relative velocity. But remember
that whatever constraints go into the solver are the only constraints avoiding mo-
tion, so a single constraint will naturally transfer motion from the constraint axis
to the other two. A proper ball joint needs three constraints to be stable, and even
the way the three constraints are aligned matters. Keeping the constraints aligned
roughly the same way every frame helps stability. World axes are a good start-
ing point, but using the axes of one of the objects can be even better, since they
will then be stationary to at least one of the objects, keeping the configuration as
similar as possible.

i i

i i
i i

i i

2.7. Joints 41

1 2 3

Figure 2.9. Hard joint limits might start oscillating due to discretization.

2.7.3 Joint Limits


Some joints support limits that block either linear or angular motion. This is very
similar to a contact constraint. A common artifact with jointed structures with
limits is that they tend to shake and never come to rest. Even if a joint limit is
supposed to be a hard limit, it is usually a good idea to soften it up a tiny bit.
A hard limit that fully engages when limit is exceeded and fully disengaged oth-
erwise is very hard to get stable. Consider the limited hinge joint in Figure 2.9.
Before it hits the limit, the joint can move freely. Now, since the simulation is
carried out in discrete steps, this means that the joint limit will not kick in until
the limit is already exceeded. Once it is exceeded, the geometric term that is sup-
posed to correct the joint will kick the joint back, causing the limit to disengage
and fall back down again. This is a good example of how rapidly changing the
configuration causes instability.
Using soft limits, so that the hinge is allowed to rest on a spring for a certain
distance, will give the solver a chance to find equilibrium without changing the
configuration every frame.

2.7.4 Dealing with the Dead Guy


Ragdolls might qualify as the number one physics frustration worldwide, and
numerous games are still shipped with ragdolls doing the monkey dance while
“dead.” In my experience, ragdoll instability is due to two main factors—hard
joint limits and excess inter-bone collisions. Applying soft limits as described
above will get you halfway there. A ragdoll is a pretty complex structure, espe-
cially since it can end up on the ground in any pose, including one that engages
multiple joint limits.
Shaking usually appears either when the configuration changes or when there
are conflicting constraints. The more constraints there are to solve, the higher
chance there is for conflicting ones. Therefore, it is usually a good idea to disable
as many collisions as possible. Start with a ragdoll with all bone–bone collisions

i i

i i
i i

i i

42 2. Understanding Game Physics Artifacts

turned off. You will be surprised how good it still looks. You might want to
enable certain collisions, such as hips–lower arms, and calf-calf collisions, but in
general, it is fine to leave most of the other ones, assuming you have a decent
setup of joint limits.
Finally, add a certain amount of damping or friction to all joints. The flesh in
the human body naturally dampens any motion, so some amount of friction will
look more natural, at the same time helping our ragdoll get some sleep.

2.7.5 Geometric Joint Recovery


Since joint drifting cannot be completely avoided, it is tempting to do a final
geometric translation to pull joints back together. This can work well in some
situations, but for the most part, it will add instability and energy to the overall
system. Consider the scene illustrated in Figure 2.10. Translating the joint back
into position introduces a penetration that will at the next frame push the body up
and add energy to the system, possibly causing a new joint displacement. If we
really want to get our hands dirty and implement geometric recovery, we should
consider the whole system, also doing it for collisions to resolve penetrations, and
modify both position and rotation.
A better way to do this correction is to do joint translation as a pure visual
effect. In the ragdoll case, many games use only rotation from the physics rep-
resentation, while keeping a fixed displacement, efficiently hiding joint drifting.
However, if the joint displacement is large, it can cause visual penetration, espe-
cially at the outermost limbs of the ragdoll.

1 2

Figure 2.10. Compensating for joint drift by moving the objects is usually a really bad
idea.

2.8 Direct Animation


Sometimes we might want to simply animate physical objects, having them affect
other objects but not be affected themselves. There are several ways to do this,

i i

i i
i i

i i

2.9. Artifact Reference 43

including using joint motors, to physically drive the object. However, sometimes
we simply want to move an object along an animated path, totally unaffected by
collisions. Animating an object by simply setting its position is never a good
idea. It might still affect objects in its environment, but collisions will be soft and
squishy. This is partly because the velocity of the object is not updated correctly,
so for all the solver knows, there is a collision with penetration, but it is not aware
that any of the objects are moving. To avoid this, make sure to update the velocity
to match the actual motion. Some engines have convenience functions for this.
Even when the velocity is correct, if the animated object is not considerably
heavier than the objects it is colliding with, the collisions will be soft and squishy.
For an animated object to fully affect the environment, its mass and inertia tensor
should be infinite. Only then will other objects fully obey and move out of the
way. Hence, if we animate objects by setting their position, make sure to give
them correct velocity, both linear and angular, and make the mass and inertiaten-
sor temporarily infinite.

2.9 Artifact Reference


Following is a list of artifacts and their causes.

• Frame rate gradually slows down to grinding halt. You might have hit
the well of despair, where the physics engine tries to compensate for its
own slow down. Put a cap on the number of physics steps per frame or
implement a more sophisticated time-stepping algorithm.

• Simulation runs in slow motion. Check that the physics step size corre-
sponds to actual time. Keep an eye on simulation scale. A larger scale will
result in slow-motion effects.

• Stacked objects are shaking or rattling. Check the contact-generation code


and make sure the configuration is not rapidly changing.

• An aligned object dropped on a flat surface bounces off in a weird way.


This is natural behavior of Gauss-Seidel iteration.

• Objects at the bottom of a stack do not feel the weight of the ones on top.
This is caused by a shock-propagation scheme or decoupled friction with
fixed maximum force.

• Highly asymmetric objects act unstable. The low inertia around one of the
axes causes a lot of rotation. Increase inertia tensors, as if the objects were
more symmetric.

i i

i i
i i

i i

44 2. Understanding Game Physics Artifacts

• Stacked objects act springy and objects get squashed. The solver iteration
count might be too low. We can also try adding warm starting or a shock-
propagation scheme.

• Stacks are oscillating and tend to never come to rest. Too much warm
starting is being used.

• Stacked objects slide around on each other, eventually falling over. There
is a lack of friction-drift compensation.

• An object penetrate freely and then suddenly shoots out. This can be an
incorrect bounding box or a contact-generation problem.

• Objects are getting pushed through walls by other objects. The contact
stream might not favor static contacts. Rearrange the contact stream so that
static contacts are at the end of the stream.

• Small, fast objects pass through walls. Enable continuous collision detec-
tion or early engage. If the problem still does not go away, it can be due to
rotation. Make the object thicker or increase inertia tensor.

• Falling object stop before hitting the floor and then fall down the last bit.
This is cased by early-engage contact generation. You can add some resti-
tution to hide the problem or implement more sophisticated continuous col-
lision detection.

• Jointed structures drift apart, causing visual separation. This cannot en-
tirely be avoided due to the nature of iterative solvers and linearization.
Use a direct solver to minimize the problem. You can also try a visual joint
displacement, if applicable.

• Ragdolls are shaking and never come to rest. There can be conflicting joint
limits, too many inter-bone collisions, or joint limits that are too hard.

• An animated object does not affect the environment properly. The animated
object might have incorrect velocity, or the mass or inertia is not infinite.

i i

i i
References

1 - I - Game Physics 101

[Van Verth and Bishop 08] James M. Van Verth and Lars M.
Bishop. Essential Mathematics for Games and Interactive
Applications, Second edition. San Francisco: Morgan
Kaufmann, 2008.

[Witkin and Baraff 01] Andrew Witkin and David Baraff.


“Physically Based Modelling: Principles and Practice.” ACM
SIGGRAPH 2001 Course Notes. Available at
http://www.pixar.com/companyinfo/research/pbm2001/, 2001. 2
Understanding Game Physics Artifacts Dennis Gustafsson 2.1
Introduction Physics engines are known for being
notoriously hard to debug. For most people, physics
artifacts are just a seemingly random stream of weird
behavior that makes no sense. Few components of a game
engine cause much frustration and hair loss. We have all
seen ragdolls doing the funky monkey dance and stacks of
“rigid” bodies acting more like a tower of greasy
mushrooms, eventually falling over or taking off into the
stratosphere. This chapter will help you understand the
underlying causes of this behavior and common mistakes that
lead to it. Some of them can be fixed, some of them can be
worked around, and some of them we will just have to live
with for now. This is mostly written for people writing a
physics engine of their own, but understanding the
underlying mechanisms is helpful even if you are using an
off-the-shelf product. 2.2 Discretization and Linearization
Physics engines advance time in discrete steps, typically
about 17 ms for a 60 Hz update frequency. It is not
uncommon to split up the time step into smaller steps, say
two or three updates per frame (often called substepping)
or even more, but no matter how small of a time step you
use, it will still be a discretization of a continuous
problem. Real-world physics do not move in steps, not even
small steps, but in a continuous motion. This is by far the
number one source for physics artifacts, and any properly
implemented physics engine should behave better with more
substeps. If a physics artifact does not go away with more
substeps, there is most likely something wrong with your
code. The bullet-through-paper problem illustrated in
Figure 2.1 is a typical example of a problem that is caused
by discretization. 29

1 32

Figure 2.1. Discretization can cause fast-moving objects to


travel through walls.
Another big source of artifacts is the linearization that
most physics engines

employ—the assumption that during the time step everything


travels in a linear

motion. For particle physics, this is a pretty good


approximation, but as soon as

you introduce rigid bodies and rotation, it falls flat to


the ground. Consider the ball

joint illustrated in Figure 2.2. The two bodies are


rotating in opposite directions.

At this particular point in time, the two bodies are lined


up as shown. Even if

the solver manages to entirely solve the relative velocity


at the joint-attachment

point to zero, as soon as time is advanced, no matter how


small the amount, the

two attachment points will drift apart. This is the


fundamental of linearization,

which makes it impossible to create an accurate physics


engine by solving just for

relative linear velocities at discrete points in time.

Even though linearization and discretization are two


different approximations,

they are somewhat interconnected. Lowering the step size


(increasing the number

of substeps) will always make linearization less


problematic, since any nonlinear

motion will appear more and more linear the shorter the
time span. The ambitious

reader can make a parallel here to the Heisenberg principle


of uncertainty!

The major takeaway here is that as long as a physics engine


employs dis
cretization and linearization, which all modern physics
engines and all algorithms 1 2

Figure 2.2. Even if relative linear velocity at the joint


attachment is zero, objects can

separate during integration due to rotation. and examples


in this book do, there will always be artifacts. These
artifacts are not results of a problem with the physics
engine itself, but the assumptions and approximations the
engine is built upon. This is important to realize, because
once you accept the artifacts and understand their
underlying causes, it makes them easier to deal with and
work around. 2.3 Time Stepping and the Well of Despair
Since the physics engine is advanced in discrete steps,
what happens if the game drops a frame? This is a common
source of confusion when integrating a physics engine,
since you probably want the motion in your game to be
independent of frame rate. On a slow machine, or in the
occasion of your modern operating system going off to index
the quicksearch database in the middle of a mission, the
graphical update might not keep up with the desired
frequency. There are several different strategies for how
to handle such a scenario from a physics perspective. You
can ignore the fact that a frame was dropped and keep
stepping the normal step length, which will create a
slow-motion effect that is usually highly undesirable.
Another option is to take a larger time step, which will
create a more realistic path of motion but may introduce
jerkiness due to the variation in discretization. The third
option is to take several, equally sized physics steps.
This option is more desirable, as it avoids the slowdown
while still doing fixed-size time steps. 2.3.1 The Well of
Despair Making several physics updates per frame usually
works fine, unless the physics is what is causing the
slowdown to begin with. If physics is the bottleneck, the
update frequency will go into the well of despair, meaning
every subsequent frame needs more physics updates, causing
a slower update frequency, resulting in even more physics
updates the next frame, and so on. There is unfortunately
no way to solve this problem other than to optimize the
physics engine or simplify the problem, so what most people
do is put a cap on the number of physics updates per frame,
above which the simulation will simply run in slow motion.
Actually, it will not only run in slow motion but it will
run in slow motion at a lower-thannecessary frame rate,
since most of what the physics engine computes is never
even shown! A more sophisticated solution is to measure the
time of the physics update, compare it to the overall frame
time, and only make subsequent steps if we can avoid the
well of despair. This problem is not trivial, and there is
no ultimate solution that works for all scenarios, but it
is well worth experimenting with since it can have a very
significant impact on the overall update frequency.

2.4 The Curse of Rotations

Since rotation is the mother of most linearization


problems, it deserves some spe

cial attention. One fun experiment we can try is to make


the inertia tensor for

all objects infinite and see how that affects our


simulation. The inertia tensor can

roughly be described as an object’s willingness to rotate


and is often specified

as its inverse, so setting all values to zero typically


means rotations will be com

pletely disabled. You will be surprised how stable those


stacks become and how

nicely most scenarios just work. Unfortunately, asking the


producer if it is okay

to skip rotations will most likely not be a good idea, but


what we can learn is that

the more inertia we add, the less rotation will occur,


problems with linearization

will decrease, and the simulation will get more stable.

The problem is especially relevant on long, thin rods. So


if you experience

instability with such objects, try increasing the inertia,


especially on the axis along

the rod (compute inertia as if the rod was thicker).


Increasing inertia will make

objects look heavy and add a perceived slow-motion effect,


so you might want to

take it easy, but it can be a lifesaver and is surprisingly


hard to spot.

2.5 Solver

Just to freshen up our memory without getting into


technical detail, the solver

is responsible for computing the next valid state of a


dynamic system, taking

into account various constraints. Now, since games need to


be responsive, this

computation has to be fast, and the most popular way of


doing that is using an

iterative method called sequential impulse. The concept is


really simple: given a

handful of constraints, satisfy each one of them, one at a


time, and when the last

one is done, start over again from the beginning and do


another round until it is

“good enough,” where good enough often means, “Sorry man,


we are out of time,

let’s just leave it here.”

What is really interesting, from a debugging perspective,


is how this early

termination of a sequential impulse solver can affect the


energy of the system.

Stopping before we are done will not add energy to the


system, it will drain en

ergy. This means it is really hard to blame the solver


itself for energy being added

to the system.

When you implement a sequential impulse solver with early


termination,

stacked, resting objects tend to sink into each other.


Let’s investigate why this is
happening: at each frame, gravity causes an acceleration
that increases an object’s

downward velocity. Contact generation creates a set of


points and at each contact,

the solver tries to maintain a zero-relative velocity.


However, since greedy game programmers want CPU cycles for
other things, the solver is terminated before it is
completely done, leaving the objects with a slight downward
velocity instead of zero, which is desired for resting
contact. This slight downward velocity causes objects to
sink in, and the process is repeated. To compensate for
this behavior, most physics engines use a geometric measure
for each contact point: either penetration depth or
separation distance. As the penetration depth increases,
the desired resulting velocity is biased, so that it is not
zero but is actually negative, causing the objects to
separate. This translates to objects being soft instead of
rigid, where the softness is defined by how well the solver
managed to solve the problem. This is why most solvers act
springy or squishy when using fewer iterations. Hence, the
best way to get rid of the mushroom is to increase the
number of iterations in the solver! 2.5.1 Keeping the
Configuration Unchanged A solver that uses this kind of
geometric compensation running at the same step size and
same number of iterations every frame will eventually find
an equilibrium after a certain number of frames.
Understanding that this equilibrium is not a relaxed state
but a very complex ongoing struggle between gravity,
penetrating contacts, and penalty forces is key to
stability. Removing or adding even a single constraint, or
changing the number of iterations, will cause the solver to
redistribute the weight and find a new equilibrium, which
is a process that usually takes several frames and causes
objects to wiggle. The set of constraints for a specific
scenario is sometimes called its configuration; hence
keeping the configuration unchanged from one frame to the
next is very important, and we will revisit this goal
throughout the chapter. 2.5.2 Warm Starting Assuming that
the configuration does not change and objects are at rest,
the impulses at each contact point will be essentially the
same every frame. It seems kind of unnecessary to recompute
the same problem over and over again. This is where warm
starting comes into the picture. Instead of recomputing the
impulses from scratch every time, we can start off with the
impulses from the previous frame and use our solver
iterations to refine them instead. Using warm starting is
almost always a good idea. The downside is that we have to
remember the impulses from the last frame, which requires
some extra bookkeeping. However, since most physics engines
keep track of pairs anyway, this can usually be added
relatively easily. I mentioned before that a sequential
impulse solver does not add energy but rather drains energy
from a system. This unfortunately no longer holds true if

1 32 4

Figure 2.3. A sequential impulse solver can cause an


aligned box falling flat to the ground

to bounce off with rotation.

warm starting is being used. Full warm starting can give a


springy, oscillating

behavior and prevents stacks from ever falling asleep.


Because of this, the cur

rent frame’s impulses are usually initialized with only a


fraction of the previous

frame’s impulses. As we increase this fraction, the solver


becomes more springy,

but it can also handle stacking better. It could be worth


experimenting with this

to find the sweet spot.

2.5.3 Who Is Tilting My Box

A sequential impulse solver, as described above, is called


in mathematical terms

Gauss-Seidel iteration. Another method is Jacobi iteration,


in which all contact

points are solved independently, and then the resulting


impulses are applied all at

once, hence removing the sequential in sequential impulse.


Jacobi solvers have

some nice properties, especially when it comes to


parallelization, but they gener

ally take way more iterations to converge. One effect of


sequential contact solving
is that symmetric problems often have seemingly
unpredictable solutions. Con

sider a perfectly aligned box dropped on a horizontal


plane. All four corners hit

the plane at the same time, even forming four identical


contact points. A sequen

tial impulse solver will start solving one contact point


without considering the

other three, apply the resulting impulse and then consider


the next one. While

solving the second contact, the problem is no longer


symmetric, since the box is

rotating after applying the first impulse. The resulting


motion will behave as if

one corner of the box hit the ground slightly before the
others (see Figure 2.3).

Hence, whenever we see this type of behavior, it is most


likely not an error, just

brother Gauss-Seidel pulling a prank.

2.5.4 Friction

Friction is usually a little trickier than nonpenetration


constraints since the max

imum applied force depends on the normal force. The more


pressure there is on

an object, the better it sticks. This interdependence


results in a nonlinear problem

that is very tricky to solve accurately. Coupled or


decoupled. There are two main approaches to solving
friction— coupled and decoupled. In the coupled approach,
the maximum friction force changes while iterating,
basically trying to solve a nonlinear problem with a
toolbox that is designed for linear problems
(Gauss-Seidel), which may sound inappropriate but actually
works fairly well in practice. The decoupled involves using
a fixed maximum friction force that is determined before
iterating. In the case of decoupled friction, there are two
popular methods: either using the normal force from the
last time step, which requires some bookkeeping, or using a
fixed value, regardless of normal force. Such a fixed value
is often based on the normal force to keep the body at rest
when affected by gravity. This may sound like a very crude
approximation, but it works surprisingly well, requires no
bookkeeping, and is perfectly linear. The main drawback is,
of course, that friction is unaffected by how much pressure
is on the object. An object at the bottom of a stack slides
out just as easily as the ones on top! Friction in stacks.
It is worth mentioning the importance of proper friction
for handling stable stacking. Even in a scenario that seems
largely unaffected by friction, like a pyramid of boxes,
friction plays a very important role. Remember that the
solver causes objects to rotate as an artifact of
Gauss-Seidel iteration. This rotation introduces a
tangential motion that causes a stack to tip over if no
friction is used. Friction drift. Remember the description
above, about early solver termination causing stacked
objects to sink into each other? The exact same thing
happens to friction constraints, so if not compensated for,
stacked objects might slide around slowly on top of each
other, eventually causing the whole thing to fall over.
Tracking friction drift is cumbersome because it involves
tracking pairs of objects over several frames. For
penetration depth it is rather straightforward since the
desired configuration is determined by the shape of the
objects. For static friction, it is not quite that easy.
Static friction can be seen as a temporary joint holding
two objects together in the contact plane. If the maximum
joint force is exceeded, the objects should actually slide,
but as long as the force is within the maximum friction
force, the relative net motion should ideally be zero.
Hence, any motion that actually occurs is due to early
solver termination, linearization, or any other of our
artifact friends. Measuring this drift and compensating for
it over time can therefore help maintain stable stacking
and natural friction behavior. 2.5.5 Shock Propagation As a
way to counteract the squishiness of iterative solvers, a
shock-propagation scheme can be used. The idea is to
analyze the configuration and set up the

problem in such a way so that the solver can find a


solution more quickly. Some

engines maintain an explicit graph of how the objects


connect, whereas other en

gines temporarily tweak mass ratios, inertia, or gravity.


There is a lot of creativity
in shock propagation, but the artifacts are usually similar.

Large stacks require many iterations because the impulses


at the bottom of the

stack are many times bigger than they would be for any pair
of objects solved in

isolation. It takes many iterations to build up these large


impulses. With shock

propagation, objects at the bottom of a stack will not feel


the entire weight of the

objects on top. This can show up as the unnatural behavior


of stacks tipping over

and can also be very obvious when observing friction—an


object at the bottom of

a stack can be as easily pulled out as one on top.

2.6 Collision Detection

The collision-detection problem is often broken down into


two or three phases.

First a broad phase, detecting objects in close proximity,


and then sometimes a

mid phase, breaking down structures into smaller parts,


before the near phase,

computing the actual contact points.

2.6.1 Phases

Broad phase. Let us start with the broad phase, which has a
relatively well

defined task: report overlaps of bounding volumes, most


often axis-aligned bound

ing boxes. If the bounding box is too small, we might


experience weird shootouts

as the broad phase reports nonoverlap until the objects are


already in penetration.
Having the bounding boxes too big, on the other hand, has a
performance impli

cation, so we have to be sure to make them just right.


Remember that if we use

continuous collision detection or intentional separation


distance, these must be

included in the bounding-box computation, so that the


bounding box is no longer

tight-fitting around the object. These errors can be hard


to spot since it looks right

most of the time.

Mid phase. The mid phase often consists of a


bounding-volume hierarchy to

find convex objects in close proximity. Again, incorrect


bounding-box compu

tation can lead to shootouts. Another common problem is


that objects can get

stuck in between two convex parts of a compound geometry.


Consider the object

consisting of two spheres in Figure 2.4. Convex geometries


are usually treated in

isolation, causing two conflicting contact points with


opposite normals and pene

tration depths. Feeding this problem to the solver is a


dead end—there is no valid

solution! The objects will start shaking violently and act


very unstable. There 1 2 Figure 2.4. Compound geometries
can cause artifacts when objects get stuck in between
parts. is no good solution to this, but avoid using many
small objects to make up compound bodies. In the case
above, a capsule or cylinder would have avoided the
problem. 1 32 Figure 2.5. An object sliding over a compound
geometry can catch on invisible hooks due to penetration.
Sliding. A similar problem can occur when an object is
sliding over a flat surface that is made up of multiple
parts. Imagine the scene in Figure 2.5. The box should
ideally slide over the seam without any glitches, but they
way the object is constructed, the seam can create
invisible “hooks” causing the sliding object to 1 2 Figure
2.6. Making a ramp on each side and letting them overlap is
a simple work-around to avoid objects getting stuck in
compound objects.

stop. This is a typical frustrating artifact in certain car


racing games where the

car can get trapped on invisible hooks while sliding along


the fence. A simple

workaround is to construct the geometry as suggested in


Figure 2.6.

Near phase. The near phase is by far the most complex part,
where the actual

contact generation occurs. The poor solver is often blamed


for unstable and jit

tering simulations, but surprisingly often, shaking


objects, general instability, and

jerkiness can be attributed to inadequate contact


generation. A sequential-impulse

solver can be blamed for squishy stacks, improper friction,


and many other things,

but it is actually quite hard to make a solver that causes


objects to rattle and shake.

Near-phase contact generation often has many special cases


and can be prone to

numerical floating-point precision issues. Some engines use


contact pruning to

remove excess contact points. Special care should then be


taken to make sure the

same contacts are pruned every frame. Remember that keeping


the configuration

unchanged is key to stability.

2.6.2 Continuous Collision Detection

Ah, continuous collision detection, a technique that


prevents objects from slipping

through walls—how about that! Just enable it, sit back, and
enjoy how everything

magically works? Not quite, unfortunately.

Let us start by splitting the problem domain into two


categories. First, there

are artifacts caused by discretization, typically, a small


object passing through a

wall, called the bullet-through-paper problem already


mentioned in the beginning

of this chapter. The other category is when contact is


detected and generated,

but the solver fails to find a proper solution, usually


because of early termination.

This artifact can be very significant when a light object


is getting squished in

between two heavy objects and is sometimes referred to as


the sandwich case (see

Figure 2.7). 1 2

Figure 2.7. Fast-moving objects are not the only ones


taking shortcuts through walls.

Early solver termination can cause objects to get squished


even if contacts are detected and

generated. 1 32 4 Figure 2.8. Fast-moving objects could


potentially get rotated through the floor even if a contact
is generated. There is also a fairly common case that is a
combination of the two. Imagine a thin rod, slightly
inclined, falling onto a flat surface, as illustrated in
Figure 2.8. The initial impact on the left side can cause a
rotation so severe that by the next time step, more than
half of the rod has already passed through the floor, and
contact generation pushes it out on the other side. This
example is a good illustration of the sometimes complex
interaction between linearization and discretization that
can bring a seemingly simple case like this to epic
failure, even with continuous collision detection switched
on. Note that some physics engine actually do have a really
sophisticated nonlinear continuous collision detection that
does consider rotation as well, in which case the example
mentioned above would have actually worked. Sandwich case.
The sandwich case can be somewhat worked around by
prioritizing contacts. It is always the last constraints in
a sequential impulse solver that will be the most powerful
and less prone to be violated upon early termination.
Therefore, it is best to rearrange the stream of contacts
so that the ones that touch important game-play mechanisms,
such as walls, are solved at the very last. A good common
practice to avoid having objects get pushed through walls
or the floor is to solve all contacts involving a static
object after any other contact or do an extra iteration or
two after termination to satisfy only static contacts.
Bullet-through-paper. An engine that aims to solve only the
bullet-throughpaper case typically uses a raycast or linear
sweep operation to find a time of impact, then either
splits up the time step—simulates the first half until the
object is touching and then does the rest—or employs an
early-engage method that inserts a contact point before the
object has actually reached the surface. The early-engage
method can sometimes be noticed as an invisible wall in
front of obstacles, especially when using zero restitution,
in which case a falling object could come to a full stop
some distance above the floor before finally falling the
last bit.

2.7 Joints

Joints are at the most fundamental level simpler than


contacts. It is an equality

constraint, keeping the relative velocity between two


bodies to zero. No inequal

ities, no interdependent friction, etc. However, the way we


combine constraints,

and add limits, breakable constraints, joint friction, and


damping typically make

them fairly complex.

2.7.1 Drift

The most common artifact with joints is drifting, i.e., an


unintended separation

between the two jointed objects. It is the joint


counterpart to stacked objects
sinking into each other. The solver simply fails to find a
valid solution within the

limited number of iterations. However, as described in the


introduction to this

chapter, even with an unlimited number of iterations,


joints can still drift due to

the linearization of velocities. Most engines cope with


drifting in the same way

they cope with penetration or friction drift: simply add a


geometric term, acting

as a spring to compensate for the drift.

2.7.2 Solving Direct

A good way to reduce joint drift is to solve as many


constraints as possible at

the same time. Since joints are made up of equality


constraints, they can be

solved as a system of linear equations, sometimes referred


to as a direct solver.

Solving a system of linear equations is more complicated


than applying sequential

impulses, but it does pay off in stability. On the upside,


these two methods can

be easily combined. Some engines solve systems of three


orthogonal constraints

(this particular assembly is found in many joint types) as


a special case with a

three-by-three matrix inversion and then interweave the


rest of the constraints

using sequential impulses.

The way the constraints are placed also matters when it


comes to stability.

Consider a ball joint. It might be tempting to use a single


constraint in the direc

tion of maximum separation or in the direction of relative


velocity. But remember

that whatever constraints go into the solver are the only


constraints avoiding mo

tion, so a single constraint will naturally transfer motion


from the constraint axis

to the other two. A proper ball joint needs three


constraints to be stable, and even

the way the three constraints are aligned matters. Keeping


the constraints aligned

roughly the same way every frame helps stability. World


axes are a good start

ing point, but using the axes of one of the objects can be
even better, since they

will then be stationary to at least one of the objects,


keeping the configuration as

similar as possible. 1 32 Figure 2.9. Hard joint limits


might start oscillating due to discretization. 2.7.3 Joint
Limits Some joints support limits that block either linear
or angular motion. This is very similar to a contact
constraint. A common artifact with jointed structures with
limits is that they tend to shake and never come to rest.
Even if a joint limit is supposed to be a hard limit, it is
usually a good idea to soften it up a tiny bit. A hard
limit that fully engages when limit is exceeded and fully
disengaged otherwise is very hard to get stable. Consider
the limited hinge joint in Figure 2.9. Before it hits the
limit, the joint can move freely. Now, since the simulation
is carried out in discrete steps, this means that the joint
limit will not kick in until the limit is already exceeded.
Once it is exceeded, the geometric term that is supposed to
correct the joint will kick the joint back, causing the
limit to disengage and fall back down again. This is a good
example of how rapidly changing the configuration causes
instability. Using soft limits, so that the hinge is
allowed to rest on a spring for a certain distance, will
give the solver a chance to find equilibrium without
changing the configuration every frame. 2.7.4 Dealing with
the Dead Guy Ragdolls might qualify as the number one
physics frustration worldwide, and numerous games are still
shipped with ragdolls doing the monkey dance while “dead.”
In my experience, ragdoll instability is due to two main
factors—hard joint limits and excess inter-bone collisions.
Applying soft limits as described above will get you
halfway there. A ragdoll is a pretty complex structure,
especially since it can end up on the ground in any pose,
including one that engages multiple joint limits. Shaking
usually appears either when the configuration changes or
when there are conflicting constraints. The more
constraints there are to solve, the higher chance there is
for conflicting ones. Therefore, it is usually a good idea
to disable as many collisions as possible. Start with a
ragdoll with all bone–bone collisions

turned off. You will be surprised how good it still looks.


You might want to

enable certain collisions, such as hips–lower arms, and


calf-calf collisions, but in

general, it is fine to leave most of the other ones,


assuming you have a decent

setup of joint limits.

Finally, add a certain amount of damping or friction to all


joints. The flesh in

the human body naturally dampens any motion, so some amount


of friction will

look more natural, at the same time helping our ragdoll get
some sleep.

2.7.5 Geometric Joint Recovery

Since joint drifting cannot be completely avoided, it is


tempting to do a final

geometric translation to pull joints back together. This


can work well in some

situations, but for the most part, it will add instability


and energy to the overall

system. Consider the scene illustrated in Figure 2.10.


Translating the joint back

into position introduces a penetration that will at the


next frame push the body up
and add energy to the system, possibly causing a new joint
displacement. If we

really want to get our hands dirty and implement geometric


recovery, we should

consider the whole system, also doing it for collisions to


resolve penetrations, and

modify both position and rotation.

A better way to do this correction is to do joint


translation as a pure visual

effect. In the ragdoll case, many games use only rotation


from the physics rep

resentation, while keeping a fixed displacement,


efficiently hiding joint drifting.

However, if the joint displacement is large, it can cause


visual penetration, espe

cially at the outermost limbs of the ragdoll. 1 2

Figure 2.10. Compensating for joint drift by moving the


objects is usually a really bad

idea.

2.8 Direct Animation

Sometimes we might want to simply animate physical objects,


having them affect

other objects but not be affected themselves. There are


several ways to do this, including using joint motors, to
physically drive the object. However, sometimes we simply
want to move an object along an animated path, totally
unaffected by collisions. Animating an object by simply
setting its position is never a good idea. It might still
affect objects in its environment, but collisions will be
soft and squishy. This is partly because the velocity of
the object is not updated correctly, so for all the solver
knows, there is a collision with penetration, but it is not
aware that any of the objects are moving. To avoid this,
make sure to update the velocity to match the actual
motion. Some engines have convenience functions for this.
Even when the velocity is correct, if the animated object
is not considerably heavier than the objects it is
colliding with, the collisions will be soft and squishy.
For an animated object to fully affect the environment, its
mass and inertia tensor should be infinite. Only then will
other objects fully obey and move out of the way. Hence, if
we animate objects by setting their position, make sure to
give them correct velocity, both linear and angular, and
make the mass and inertiatensor temporarily infinite. 2.9
Artifact Reference Following is a list of artifacts and
their causes. • Frame rate gradually slows down to grinding
halt. You might have hit the well of despair, where the
physics engine tries to compensate for its own slow down.
Put a cap on the number of physics steps per frame or
implement a more sophisticated time-stepping algorithm. •
Simulation runs in slow motion. Check that the physics step
size corresponds to actual time. Keep an eye on simulation
scale. A larger scale will result in slow-motion effects. •
Stacked objects are shaking or rattling. Check the
contact-generation code and make sure the configuration is
not rapidly changing. • An aligned object dropped on a flat
surface bounces off in a weird way. This is natural
behavior of Gauss-Seidel iteration. • Objects at the bottom
of a stack do not feel the weight of the ones on top. This
is caused by a shock-propagation scheme or decoupled
friction with fixed maximum force. • Highly asymmetric
objects act unstable. The low inertia around one of the
axes causes a lot of rotation. Increase inertia tensors, as
if the objects were more symmetric.

• Stacked objects act springy and objects get squashed. The


solver iteration count might be too low. We can also try
adding warm starting or a shockpropagation scheme.

• Stacks are oscillating and tend to never come to rest.


Too much warm starting is being used.

• Stacked objects slide around on each other, eventually


falling over. There is a lack of friction-drift
compensation.

• An object penetrate freely and then suddenly shoots out.


This can be an incorrect bounding box or a
contact-generation problem.

• Objects are getting pushed through walls by other


objects. The contact stream might not favor static
contacts. Rearrange the contact stream so that static
contacts are at the end of the stream.

• Small, fast objects pass through walls. Enable continuous


collision detection or early engage. If the problem still
does not go away, it can be due to rotation. Make the
object thicker or increase inertia tensor.

• Falling object stop before hitting the floor and then


fall down the last bit. This is cased by early-engage
contact generation. You can add some restitution to hide
the problem or implement more sophisticated continuous
collision detection.

• Jointed structures drift apart, causing visual


separation. This cannot entirely be avoided due to the
nature of iterative solvers and linearization. Use a direct
solver to minimize the problem. You can also try a visual
joint displacement, if applicable.

• Ragdolls are shaking and never come to rest. There can be


conflicting joint limits, too many inter-bone collisions,
or joint limits that are too hard.

• An animated object does not affect the environment


properly. The animated object might have incorrect
velocity, or the mass or inertia is not infinite.
3 - 3 - Broad Phase and Constraint
Optimization for PlayStation� 3

i i i i
3 - III - Particles

[Harada et al. 07] T. Harada, S. Koshizuka, and Y.


Kawaguchi. “Smoothed Particle Hydrodynamics on GPUs.” Paper
presented at Computer Graphics International Conference,
Petropolis, Brazil, May 30–June 2, 2007.

[Harlow and Welch 65] Francis H. Harlow and Eddie J. Welch.


“Numerical Calculation of Time-Dependent Viscous
Incompressible Flow of Fluid with Free Surface.” Physics of
Fluids 8:12 (1965), 2182–2189.

[Harris et al. 07] Mark Harris, Shubhabrata Sengupta, and


John D. Owens. “Parallel Prefix Sum (Scan) with CUDA.” In
GPU Gems 3, edited by Hubert Nguyen, pp. 851–876. Reading,
MA: Addison Wesley, 2007.

[Hjelte 06] N. Hjelte. “Smoothed Particle Hydrodynamics on


the Cell Broadband Engine.” Preprint, 2006. Available at
http://www.2ld.de/gdc2004/.

[Kanamori et al. 08] Yoshihiro Kanamori, Zoltan Szego, and


Tomoyuki Nishita. “GPU-Based Fast Ray Casting for a Large
Number of Metaballs.” Comput. Graph. Forum 27:2 (2008),
351–360.

[Koshizuka and Oka 96] S. Koshizuka and Y. Oka.


“Moving-Particle Semiimplicit Method for Fragmentation of
Incompressible Flow.” Nucl. Sci. Eng. 123 (1996), 421–434.

[Lorensen and Cline 87] William E. Lorensen and Harvey E.


Cline. “Marching Cubes: A High Resolution 3D Surface
Construction Algorithm.” In SIGGRAPH ’87: Proceedings of
the 14th Annual Conference on Computer Graphics and
Interactive Techniques, pp. 163–169. New York: ACM Press,
1987.

[Monaghan 88] J. J. Monaghan. “An Introduction to SPH.”


Computer Physics Communications 48 (1988), 89–96. Available
at http://dx.doi.org/10.1016/ 0010-4655(88)90026-4.

[Mu¨ller et al. 03] Matthias Mu¨ller, David Charypar, and


Markus Gross. “Particle-Based Fluid Simulation for
Interactive Applications.” In Proceedings of the 2003 ACM
SIGGRAPH/Eurographics Symposium on Computer Animation, pp.
154–159. Aire-la-Ville, Switzerland: Eurographics
Association, 2003.

[Mu¨ller et al. 07] Matthias Mu¨ller, Simon Schirm, and


Stephan Duthaler. “Screen space meshes.” In SCA ’07:
Proceedings of the 2007 ACM SIGGRAPH/Eurographics Symposium
on Computer animation, pp. 9–15. Airela-Ville, Switzerland:
Eurographics Association, 2007. [Ruth 83] Ronald D. Ruth.
“A Canonical Integration Technique.” IEEE Transactions on
Nuclear Science 30 (1983), 2669–2671. [Stam 99] Jos Stam.
“Stable Fluids.” In SIGGRAPH ’99: Proceedings of the 26th
Annual Conference on Computer Graphics and Interactive
Techniques, pp. 121–128. New York: ACM
Press/Addison-Wesley, 1999. [Teschner et al. 03] M.
Teschner, B. Heidelberger, M. Mueller, D. Pomeranets, and
M.Gross. “Optimized Spatial Hashing for Collision Detection
of Deformable Objects.” In Proceedings of Vision, Modeling,
Visualization VMV’03, pp. 47–54. Heidelberg: Aka GmbH,
2003. Available at http://graphics.ethz.ch/ ∼
brunoh/download/ CollisionDetectionHashing VMV03.pdf. [van
der Laan et al. 09] Wladimir J. van der Laan, Simon Green,
and Miguel Sainz. “Screen Space Fluid Rendering with
Curvature Flow.” In Proceedings of the 2009 Symposium on
Interactive 3D Graphics and Games, pp. 91–98. New York: ACM
Press, 2009. [van Kooten et al. 07] Kees van Kooten, Gino
van den Bergen, and Alex Telea. “Point-Based Visualization
of Metaballs on a GPU.” In GPU Gems 3, edited by Hubert
Nguyen, pp. 123–156. Reading, MA: Addison-Wesley, 2007.
[Vesely 01] Franz J. Vesely. Computational Physics: An
Introduction, Second edition. New York: Springer, 2001.
[Zhang et al. 08] Yanci Zhang, Barbara Solenthaler, and
Renato Pajarola. “Adaptive Sampling and Rendering of Fluids
on the GPU.” In Proceedings of the IEEE/EG International
Symposium on Volume and Point-Based Graphics, pp. 137–146.
Aire-la-Ville, Switzerland: Eurographics Association, 2008.
7 Parallelizing Particle-Based Simulation on Multiple
Processors Takahiro Harada 7.1 Introduction Particle-based
simulation is a method that can simulate liquid without
having to use any numerical techniques to track the fluid
surfaces. Simulating particle motion gives us not only the
information about the fluid surface but also about
splashes. Moreover, a particle-based method can be used for
a simplified rigidbody simulation as well [Harada 07], and
since they can be solved in the same framework, the
rigid-body simulation can be coupled with the fluid
simulation easily. Figure 7.1. Rendered image from a
simulation using multiple GPUs (see Color Plate IV). 155

However, the drawback of particle-based simulation is its


computational cost.

If the resolution of the simulation is the same as for a


grid-based simulation, i.e.,

the number of particles are the same as the number of grid


points in a grid-based

simulation, particle-based simulations of fluids are much


more expensive because

the neighboring particles have to be searched in every time


step. In order to get

good visual quality, a large number of particles have to be


simulated. It depends

on the situation, but a simulation with only thousands of


particles does not usually

give us a satisfactory result.

In this chapter, a method to parallelize particle-based


simulation on multi

ple processors with distributed memory is presented. The


method simulates the

motion of particles by splitting a simulation into smaller


simulations. Using this

method, a high-resolution simulation, as shown in Figure


7.1, can be simulated

in a few milliseconds per step. GPUs are generally used for


parallelizing simu

lations, but the present method is not limited to GPUs, as


it is also applicable to

multiple CPUs.

7.2 Dividing Computation

To utilize multiple processors for a simulation, the


computation has to be divided

into several computations. For a grid-based fluid


simulation, in which connec

tivity among fixed simulation entities is parallelized on


multiple processors, the

approach we should take is obvious. The simulation domain


is divided into sub
domains, and a subdomain is assigned to a processor.
Because of the fixed con

nectivity, the decomposition of the simulation domain has


to be done once before

the simulation starts. To calculate each subdomain, the


simulation requires some

data from an adjacent subdomain. The elements whose data


have to be transferred

to an adjacent processor are fixed. Therefore, it is


relatively easy to use multiple

processors for a grid-based fluid simulation. The overhead


of the parallelization

is not so large because of the fixed connectivity.

Particle-based simulation, the analogy of the domain


decomposition for grid

based simulation, involves dividing particles into sets


equal to the number of pro

cessors. We quickly realize that this is not a good choice,


because particles mix up

soon after a simulation starts so that the communication


among processors would

almost halt the simulation. Thus, it is not obvious how to


divide a particle-based

simulation in which simulation entities, particles, move


freely in the computation

domain on multiple processors. The overhead of


parallelization can easily kill

the benefits of using multiple processors without a


carefully designed method,

because the simulation data have to be managed at each


simulation step. We chose to use domain decomposition,
which is often used in grid-based simulation, for
particle-based simulation instead of splitting the
particles by their indices. A processor assigned to a
subdomain simulates the particles in the subdomain. At
first, particle motions are ignored for simplicity. Their
motion will be taken into account in the next section. We
first have to consider how to store the particle data. The
simplest way would be by employing server–client-type
management, in which a server processor containing all the
data distributes jobs with data to client processors and
retrieves the results in each step. Although this is easy
to implement, it requires a large data transfer. This is
not efficient when the data transfer between processors is
expensive, as with GPUs. Moreover, the clients have to wait
while the server is preparing the data to be sent.
Therefore, we used another strategy to manage the data that
is better suited for parallelizing on multiple processors,
and in which each processor manages its own data:
peer-to-peer–type management. To calculate the physical
values of a particle, the values of neighboring particles
are used: positions of neighbors are used to calculate
forces using a distinct element method (DEM) simulation
[Mishra 03]; physical values of neighbors are integrated in
a smoothed particle hydrodynamics (SPH) simulation (see
Chapter 6). Neighbors can be in the adjacent subdomain
computed by another processor. In this case, the processor
has to ask for the data from the adjacent processor.
Accessing the memory of another processor whenever it is
necessary, is inefficient because it lowers the granularity
of the memory transfer when it is smaller and more
frequent. Therefore, we introduce ghost regions to the
simulation—the entire computation domain is C = {x|s < x ≤
e}, and two processors p 0 and p 1 are used for the
simulation. The domain is decomposed at x by a plane
perpendicular to the x-axis, so the subdomains for p 0 and
p 1 are C 0 = {x|s < x ≤ m}, C 1 = {x|m < x ≤ e}, where m =
(s + e)/2 is the midpoint of the computation domain in the
xdirection. Then, the ghost region for p 0 is the area in C
1 adjacent to C 0 , so G 1→0 = {x|m < x ≤ m + g}, s m e g G
g 0→1 C 1 C 0 G C 1→0 Figure 7.2. Division of a simulation
using two processors.

and the ghost region of p 1 is the area in C 0 adjacent to


C 1 : G 0→1 = {x|m− g < x ≤ m},

where g is the size of the ghost region, as illustrated in


Figure 7.2.

When n processors are used, the simulation domain is


divided into n domains,

and each processor (except for the ones at either end) have
two ghost regions, one
on each side. Let the effective radius (particle diameter
in the case of DEM) be

r e = g; then the particles that can be the neighbors of


the particles in C 0 can

be found in the area C 0 ∪ G 1→0 . Thus, a processor does


not have to query for

particle values kept by adjacent processors during the


computation if the particle

data in the ghost region is transferred before the time


step (to be precise, this

is true for explicit computation but not for implicit


computation, like the moving

particle semi-implicit (MPS) method, which solves Poisson’s


equation of pressure

on particles [Koshizuka and Oka 96]). We refer to these


particles in a ghost region

as ghost particles. Processor p 0 updates the particles in


C 0 but only reads the

values of ghost particles. All the particles are updated


because all particles exist

in C 0 ∪ C 1 without any duplications (G 1→0 ⊂ C 1 and G


0→1 ⊂ C 0 ). If particles

were static, this would be sufficient—but particles move.


In the next section, data

management for moving particles is discussed. 7.3 Data


Management without Duplication The motion of particles
causes a flow of particles between subdomains; some
particles go to and some particles come from an adjacent
subdomain. The ghost particles at a time step can change
dynamically, so efficient management of particles is
necessary. As discussed above, we have employed
peer-to-peer–type management of particle data. Although we
chose it, there are still several other choices for how to
manage data. The easiest way is as follows: each processor
has the data of all the particles (using the same index for
each particle) and updates the data of the particles
belonging to its particular subdomain. However, this is not
memory efficient, because all the processors have to have
all the particle data. In the following subsections, we are
going to describe a method in which a processor only keeps
the data of particles in its own subdomain. Therefore,
there is no processor that holds the data of all the
particles. 7.3.1 Sending Data As discussed above, data from
a neighboring processor is necessary for the computation of
particles at a boundary of a subdomain. Also, particles
that move out of a subdomain have to be passed to an
adjacent processor. Therefore, the particles that have to
be sent to an adjacent processor are the particles that
move from their subdomain to an adjacent subdomain and also
the ghost particles in the subdomain. Let x t i be the
x-coordinate of particle i at time t calculated by
processor p 0 , which calculates subdomain C 0 . Particle i
is in the subdomain of p 0 if x t i < m. The particles that
move out from C 0 to C 1 are EP t+Δt 0→1 = {i|m < x t+Δt i
, x t i ≤ m}. (7.1) The ghost particles of p 1 in the
subdomain of C 0 that come from p 0 are GP t+Δt 0→1 = {i|m−
g < x t+Δt i ≤ m,x t i ≤ m}. (7.2) C 0 G 0→1 GP 0→1 EP 0→1
t+∆t t+∆t Figure 7.3. Particles sent from p 0 to p 1 .

Note that this does not include the ghost particles of p 1


from p 1 . From Equa

tions (7.1) and (7.2), the particles that have to be sent


to p 1 are SP t+Δt 0→1 = EP t+Δt 0→1 + GP t+Δt 0→1 = {i|x
t+Δt i > m− g, x t i ≤ m},

as shown in Figure 7.3.

To send the data, SP t+Δt 0→1 has to be selected from all


the particles in the mem

ory of a processor. Flagging particles in the region and


using prefix sums, which

is often used in algorithms on the GPU [Harris et al. 07]


to compact them to a

dense memory, adds some computation cost which may seem


negligible, but not

for high-frequency applications like our problem. Most of


the processors have

to select two sets of particles on each sides for two


neighbors if more than two

processors are used. This means we have to run these


kernels twice.
Instead, the grid constructed for efficient neighbor search
is reused to select

the particles in our implementation. The data can be


directly used to select the

particles so that we can avoid increasing the cost. The


particles that have to be sent

to C 0 are particles in voxels with x v > m− g. However,


the grid constructed in

this simulation step cannot be used directly because


particles have changed their

positions in the time step. To avoid the full build of the


grid, we used a simulation

condition to restrict the particles we want to find. We


used the distinct element

method (DEM) to calculate force on a particle by placing


springs and dampers.

DEM is an explicit method but is not unconditionally


stable. It has to restrict the

size of the time step according to the velocity to maintain


stability. Thus, we need

vΔt/l 0 < c, where v,Δt, l 0 , and c are particle velocity,


time-step size, particle

diameter, and Caulant number, respectively, which have to


be less than one. This

condition guarantees that the motion of any particle is


below its diameter. Since

we set the side length of a voxel equal to the particle


diameter, particles do not

move more than l 0 , which is the side length of a voxel.


From these conditions,

we find SP t+Δt 0→1 in simulation time t are the particles


(let the x-coordinate of this

be x) in S ′ 0→1 = {m− d− l 0 < x ≤ m},


and especially when d = r e , S ′ 0→1 = {m− 2d < x ≤ m}.

A buffer has to be prepared to store these selected


particles. When a uniform

grid is used and g = r e , two voxel widths in the


direction of the space split have

to be sent. The buffer size can be calculated from the


configuration of the grid. t+∆t GP 0→1 0 0 EP 0→1 t+∆t t+∆t
t mm-g m gg l m-g-l Figure 7.4. Particles to be sent at t +
Δt (left) and their configuration at t (right). Actually,
we are not using a uniform grid but rather a sliced grid,
which has a much tighter fit to particle distribution, as
will be described in Section 7.4. 7.3.2 Receiving Data If
all the processors are using the same indices for
particles, all we have to do is update the values of these
particles after receiving the data from other processors.
However, in our approach, each processor manages its own
data and does not have a unique index for a particle in all
of the particles of the simulation. Thus, the index of a
particle at a boundary of a subdomain does not necessarily
agree between the two processors sharing the boundary. When
one processor receives particles from another, it adds them
to its own particle list. We have to be careful about the
duplication of particles. If we cannot guarantee that the
particle sent from a neighbor does not already exist in the
list, all the particles have to be scanned to find the
entry—something we do not want to do. However, what we have
to do is delete the particles in the ghost region that were
received in the previous time step. For example, p 0
received a set of particles from p 1 at time t. The set of
particles consists of particles in x t+Δt > m and x t+Δt ≤
m. So after deleting particles in x t+Δt > m, only
particles in x t+Δt ≤ m remain. Note that this is not the
same as the set of particles in the ghost region after
updating the particle positions. This can be proved by the
following two propositions: 1. A set of particles GP t 1→0
that is in G 1→0 at time t is included in the set of
particles SP t+Δt 1→0 , which will be sent from the
adjacent subdomain at time t + Δt (see Figure 7.4).

2. A set of particles EP t+Δt 0→1 that is in C 0 at time t


and will be in G 1→0 at time t + Δt will not be included in
SP t+Δt 1→0 .

For the first proposition,because SP t+Δt 1→0 is created by


reading the grid at
time t, SP t+Δt 1→0 = {i|m < x t i ≤ m + d + l 0 }, GP t
1→0 = {i|m < x t i ≤ m + d, x t−Δt > m}.

These equations lead to GP t 1→0 ⊂ SP t+Δt 1→0 , which


proves that ghost particles

at time t will be sent from the neighbor at time t+ Δt.


Therefore, ghost particles

at time t have to be deleted before the processor receives


the particles coming

from adjacent subdomains.

For the second proposition, EP t+Δt 0→1 = {i|m− d < x t i ≤


m,m < x t+Δt },

and SP t+Δt 1→0 = {i|m < x t i ≤ m + d + l 0 }.

These equations lead to EP t+Δt 0→1 /∈ SP t+Δt 1→0 . Thus,


particles in a ghost region at

time t+Δt should not be deleted. We can also see that the
particles coming from

a neighbor have no duplication of particles in its


subdomain. So the received data

can just be added at the end of the particles of the


processor.

If a grid is used to select the particles to be sent, there


are several voxels that

are not fully saturated to the maximum capacity of a voxel.


If sent data kept being

added, invalid entries would accumulate. To prevent this,


the array is compacted

by using a prefix sum after receiving neighbors.

7.4 Choosing an Acceleration Structure

So far, we have discussed how to manage the data on


multiple processors. As

neighboring-particle search is expensive, acceleration data


structures have to be
introduced. In this section, we first discuss the
requirements for a particle-based

simulation and then present the sliced grid, which we used


for our simulation. It

not only has several advantages as an acceleration


structure, but is also well suited

for parallelized particle-based simulation using domain


decomposition. 7.4.1 Requirements for Particle-Based
Simulation The data structures introduced to make
neighboring-particle search efficient are classified into
three categories: uniform grids, hash grids, and
hierarchical grids, all illustrated in Figure 7.5. There
are two major requirements for a grid used in
particle-based simulations. The first is that the
construction cost is low enough to be reconstructed at
every time step. The other requirement is that it should be
easy to access the memory of the voxel to which a particle
belongs, because the data stored in the memory is
frequently referred to in a simulation. There is actually
another condition—although it is not necessarily required,
but is preferable—a smaller memory footprint. In the
following, we discuss these points in the three grids:
uniform grid, hash grid, and hierarchical grid. The uniform
grid allocates the memory for all the voxels in the
computation domain whether it is occupied by particles or
not. This simple nature keeps the construction and access
costs low. Although the uniform grid satisfies the two
requirements, it needs a large memory to hold the data for
all the voxels in the computation domain. There can be a
large number of empty voxels, storing no particles, which
is nothing but a waste of memory. The hash grid improves on
the uniform grid by not allocating all the voxels. Instead,
it maps the voxels to a fixed-sized array by using the hash
function. It looks to be a good candidate, but it suffers
from hash collision, in which several voxels are mapped to
the same location because the hash grid cannot guarantee a
perfect hash. When the grid is accessed, the stored values
have to be checked to see whether they are in the same
voxel or not. So the access cost is more than it is in the
uniform grid. The hierarchical grid improves the memory
efficiency a lot. Figure 7.5 (right) shows a quad tree
(correspondence to the three-dimensional case is octree),
which divides a cell with a valid entry. The top level of
the tree is the bounding box of the input data. A node with
an entry will be divided into four nodes; this is done
recursively when the criteria are met. It avoids allocation
of memory for empty space by using hierarchical
representation. The drawback of the hierarchy is the access
cost of a leaf node. Unlike the uniform grid, it cannot
calculate the memory address directly from the position of
the query. Instead, it has to traverse the tree structure
from the root of the tree. We now parallelize
particle-based simulations on multiple processors, which
has some additional requirements. One requirement is that
all the computations are parallelized. Especially when
using a GPU, the entire algorithm should be performed on
the GPU; otherwise, data have to be transferred between the
GPU and the CPU. Another consideration is that a uniform
computation burden is preferred to keep the load balance
uniform. To summarize this discussion, a uniform

Figure 7.5. Uniform grid (left), hash grid (middle), and


hierarchical grid (right). The uni

form grid allocates memory for an entire domain. The hash


grid maps a voxel to a memory

array using the hash function. The hierarchical grid only


subdivides voxels containing

particles.

grid is memory inefficient and a hash grid is not suited


for implementation on the

GPU because of the hash collision. Construction of a grid


and accessing a voxel

is computationally expensive in a hierarchical grid.

The sliced grid, developed by [Harada et al. 07], is


another option. This is

a grid whose construction cost is low, has easy access to a


voxel, and requires a

small footprint. So we chose the sliced grid for the


acceleration structure for our

neighboring-particle search. In the following, a short


introduction of the sliced

grid is presented, followed by an implementation on the GPU


using CUDA.

7.4.2 Sliced Grid


When a uniform grid is used, a bounding box is defined to
enclose the computa

tional domain, and memory for the voxels inside of the


bounding box is allocated

whether a voxel is occupied or not by a particle, as shown


in Figure 7.6 (left). We

can see that a large amount of memory is wasted because it


allocates memory for

unused voxels. However, the sliced grid allocates memory,


as shown in Figure

7.6 (right). The procedure to build a grid starts by


scanning the space for the grid

cells filled with particles. Of course, it is possible to


identify voxels containing

particles by scanning the whole space, but there is a cost


for that. The sliced grid

increases the memory efficiency by adding a little


computation.

First of all, orthogonal basis vectors (e x , e y , e z )


and a uniform grid along the

bases in the computational domain are prepared. Note that


the grid is not allocated

in the memory at this time. The first step is the scanning


of the number of voxels

required to store the data.

An axis is chosen from the bases, and the grid in the


domain is divided into

slices perpendicular to the axis. Each slice has a


one-voxel thickness in the di

rection of the axis. Thus, the slices have one dimension


less than the spatial dimension of the computation domain.
A sliced grid allocates memory for the two-dimensional
bounding boxes for each slice. By not excluding empty
voxels completely, it keeps the computation cost low. When
e x is chosen as the axis, the slice is spread over the
space of the bases e y and e z . The following explanation
assumes that the coordinate in the grid space of a point x
= (x, y, z) is b = (b x , b y , b x ) = (x · e x ,x · e y
,x · e z ). After dividing the computational space into
slices, the bounding box (two-dimensional in this case) for
each slice is calculated by scanning the grid coordinates
of all the particles. The maximum and minimum of y and z of
slice i are B y i,max = max j∈P i {b y j }, B y i,min = min
j∈P i {b y j }, B z i,max = max j∈P i {b z j }, B z i,min =
min j∈P i {b z j }, where P i = {j|b x j = i}. With these
values, the number of voxels in the yand z-directions are
computed as n y i = B y i,max −B y i,min d + 1, n z i = B z
i,max −B z i,min d + 1, where d is the side length of the
voxels. This bounding box with n i = n y i n z i voxels in
a slice is allocated in memory. This is much more efficient
than using the uniform grid, although it still has some
empty voxels. The index of a voxel at (x, y, z) at slice i
can be calculated by v i (x, y, z) = [ (y −B y i,min )/d ]
+ [ (z −B z i,min )/d ] n y i . (7.3) Placing the memory
for slices in a contiguous memory requires the offsets or
the indices of the first voxels of the slices. Let the
index of the first voxel of slice i be Figure 7.6. Uniform
grid (left) and sliced grid(sliced in the x-direction)
(right).

p i . It is calculated as the summation of the number of


voxels from the first slice

S 0 to slice S i−1 . Thus, p i = ∑ i<j n j . Taking the


prefix sum of the number of

voxels in the slices gives us the indices of the first


voxels.

We are now ready to store the data in the grid. The index
of the voxel to

which a point (x, y, z) belongs is calculated in two steps.


The first step is the

computation of the slice the point is on. It can be


calculated by i = [(b x −B x min )],

where B x min is the minimum coordinate of the slices in


the x-direction. By using

the index of the slice, the first voxel of the slice stored
in the table calculated in

the preprocessing step is read. From the index and Equation


(7.3), the index of

the voxel is calculated as follows: v(x, y, z) = p i + ( y


−B y i,min d + z −B z i,min d n y i ) .

Of course, we can push this slicing concept to another


dimension to remove

more empty voxels, i.e., by slicing in the x-direction


before slicing in the y

direction. However, this is a tradeoff between memory


saving and computation;

it adds much more overhead for real-time applications.

Implementation on the GPU. Before storing particle indices


to memory, the

bounding box and the first index of the voxel in each slice
have to be calculated.

Although these computations are trivial on a sequential


processor, it requires some

effort to perform these on multiple processors. The GPU


implementation is ex

plained in the following paragraphs.

Calculating the bounding box and the first voxel index of


every slice is per

formed in several steps. The first step is the computation


of the bounding boxes

in which memory will be allocated. The grid coordinate


calculated from the par

ticle position is inserted in the bounding box of the slice


on which the particle

is located. Although the flexibility of current GPUs makes


the serial version of

the computation possible on the GPU, it cannot exploit the


power of the GPU.

For efficiency reasons, the computation is divided into two


stages. The particles
are divided into several sets, and bounding boxes for these
sets are computed in

parallel. If there are m slices and the particles are


divided into n sets, n sets of

m bounding boxes are calculated in parallel. (Of course,


this is also effective on

other multiple processors.) Then, the results are merged


into a set of bounding

boxes. This reduction step can also be parallelized on the


GPU.

Here we assume that the x-axis is taken as the slicing


axis. Then, what we

have to compute are B y i,max , B y i,min , B z i,max , and


B z i,min , which are the maximum

and the minimum values for the yand z-directions on ith


slice of the x-direction.

Let n and m be the total number of particles and the number


of the small com

putations (we will call them jobs from now on). The ith job
is responsible for particles whose indices are in n/m ≤ a <
(i + 1)n/m. Then the bounding box of the jth slice in the
ith job is B y ij,max = max a∈P ij {b y a }, B y ij,min =
min a∈P ij {b y a }, B z ij,max = max a∈P ij {b z a }, B z
ij,min = min a∈P ij {b z a }, where P ij = {a|b x a = j,
n/m ≤ a < (i + 1)n/m}. One job is processed by a block of
threads on CUDA. Since the bounding values are frequently
read and updated, they can be stored quickly on chip memory
if available. On CUDA, shared memory is used for their
storage. However, the updating of the bounding values has
to be serialized in case of write conflicts among threads.
Therefore, whenever a thread updates a bounding volume, it
has to be locked. This kills the performance when a large
number of threads are running at the same time. To increase
the efficiency of the computation, one job is split into
smaller jobs and threads in a block are also divided into
smaller thread groups. The computation can be much more
efficient because these smaller thread groups calculate
their own bounding volume data by synchronizing fewer
numbers of threads. Figure 7.7 illustrates a three-step
computation of the bounding volumes. We will look more
closely at the implementation of this computation on a
current GPU. Reducing the size of data is a good idea in
most cases because the latency of the memory access is much
higher than are the arithmetic instructions. Also, the chip
resources that can be used in computation are limited. To
maximize the efficiency of the GPU, the register and
shared-memory usage should be kept to a minimum. The size
of the shared memory on an NVIDIA G80 is 16 KB per Particle
Positions Block0 Block1 Block2 1st Step 2nd Step 3rd Step
Figure 7.7. Computation of bounding volumes.

multiprocessor. If eight bits are used for the bounding


values, and assuming there

are 256 cells in each direction at most, a set of bounding


boxes requires 1 KB

of storage. (Of course we can use 32 bits for a bounding


value, but it strains the

local resources and results in less usage of hardware


threads.) Therefore, we can

calculate at most 16 sets of bounding boxes by the same


number of small thread

groups in a block at the same time. The computation of the


bounding volumes is

done by reading particle values from main memory with


aligned memory access

and updating the values using synchronization in the thread


group. This corre

sponds to the first step in Figure 7.7. The next step is


reduction of these sets of

values. These outputs are still placed in shared memory,


and one set of bounding

boxes is calculated from in the same kernel by assigning a


thread to a bounding

box that reads all the bounding values from the smaller
groups. In short, we have

256 slices, 256 threads run at the merge step, and thread i
assigned to the ith slice
compares all the values of the slices from all the small
groups. Then threads write

the values to the global memory at the last merge step.

The last merge step runs in another kernel. This step is


almost the same as the

previous merge except for reading the values off chip


memory this time. Instead

of using a thread to reduce a slice, tree-shaped reduction,


in which n/2 threads

are assigned to a slice (n is the number of bounding boxes)


and reduce two values

to one in a step is used; it has an advantage in


performance. This is the third

step in Figure 7.7. In this way, a set of bounding boxes is


calculated from all the

particles.

When using CUDA for the computation, the number of real


threads running

is not equal to the width of a kernel. In most cases, it is


smaller than the kernel

width. Although we can make the kernel the same block size
as the number of real

threads, increasing the size of blocks makes the


computation much more efficient

because it makes the threads switch between work groups


(like multithreading on

the CPU when a work group is stalled).

Now that we have the bounding boxes for all the slices, the
number of voxels

in a slice is calculated. This computation is tricky when


using shaders, but with

the function of synchronization among threads in a block,


it has become easier.
The prefix sum of the array is calculated in parallel to
get the indices of the first

voxels. For this parallel reduction, a method presented in


[Harris et al. 07] is used.

Figure 7.8 shows how much the sliced grid improves the
memory efficiency

in a test simulation. It compares the memory consumption of


the uniform grid,

octree, and sliced grid in a DEM simulation. We can see


that the sliced grid can

reduce the memory consumption greatly over the uniform


grid, and the efficiency

is close to the octree. Moreover, the cost of accessing the


voxel data is at least as

cheap as the uniform grid, and can be much better, as will


be shown later. Time step M e m o r y U s a g e ( B y t e s
) 0 500000 1000000 1500000 2000000 2500000 3000000 3500000
4000000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37
39 41 43 45 47 49 51 53 55 57 59 61 Memory (Uniform Grid)
Memory (Sliced Grid) Memory (Octree) Figure 7.8. Memory
consumptions when using uniform grid, sliced grid, and
octree (see Color Plate V). 7.4.3 Introducing Sort
Introduction of the sliced grid not only improves the
memory efficiency but also improves the performance thanks
to the dense voxel data,which will be shown later. This
section discusses how to improve the performance from the
perspective of cache efficiency. A simple implementation of
a particle-based simulation is accompanied by random access
of the data. The random-access pattern of the memory
reduces the performance. If the particle data are also
arranged in the order of the spatial distribution of
particles, the cache hit-rate of accessing the particle
data increases as well. However, because particles not
having any fixed connectivity move freely, the memory
location of spatially close particles becomes random as the
simulation proceeds. This reduces the memory locality and
results in the slowdown of a simulation. An idea to improve
the simulation performance is to sort the particle data by
the spatial order of particles. We have to be careful in
the selection of the sort algorithm used, especially for
real-time applications, because the speedup from the
ordering of the particle data has to be greater than the
cost of the sorting. Otherwise, it just slows the
simulation down. Researchers have been studying sorting on
the GPU. However, the best algorithm for sequential
processors is not always the best for parallel processors.
For example, quick sort, which is one of the most efficient
sorts on the CPU, does

not perform well on the GPU. Instead, sorting networks,


such as bitonic merge

sort, are preferred because of their parallel nature


[Kipfer and Westermann 05].

However, the drawback of sorting networks is that they


require lots of passes to

complete the sorting. Recently, the functionality of the


GPU has made it possible

to implement radix sort, which requires fewer passes [Grand


07].

Although the radix sort runs quickly on the GPU, the


sorting cost of the radix

sort is prohibitively expensive for the sole purpose of


improving cache efficiency.

Actually, it took more than the computation of one step on


DEM simulation in

our experiment. So it does not meet our goal. There are


several sorting algo

rithms suited for a nearly sorted list, such as insertion


sort. They are good for

situations with temporal coherency between frames, like our


simulation. But the

problem here is that an insertion sort is a completely


sequential algorithm, which

is not good for multiple processors, such as GPUs. But what


we want is not a

completion of a sort in a frame because the sort is used


just to increase the spatial

coherency of the data. Even if a sort in a time step


improves the order of the lists

more or less, it would improve the cache efficiency.

7.4.4 Block Transition Sort

Among sorting networks, we have chosen an odd-even


transition sort, a sorting

network that completes a sort by repeating two simple


operations: comparing ad

jacent odd-even index pairs and flipping them if they are


in the wrong order, then

comparing adjacent even-odd index pairs. If blocks with an


arrow in Figure 7.9

are thought of as two adjacent elements, it shows how the


sorting works. Odd

even transition sort is good for a nearly sorted list but


is pretty poor when it is

applied to a random list. If only two adjacent elements are


flipped, it can com

plete the sort in one or two steps. But if they are


arranged in the reverse order, it

takes n steps to move them to the correct order.

1st pass

2nd pass

3rd pass

4th pass

Figure 7.9. Block transition sort. An array is divided into


blocks, and two adjacent blocks

are sorted in a pass. A 0 A 1 B 0 B 1 C 0 C 1 D 0 D 1 A 0 B


0 A 1 B 1 C 0 D 0 C 1 D 1 A 0 B 0 C 0 D 0 A 1 B 1 C 1 D 1 A
0 A 1 B 0 B 1 C 0 C 1 D 0 D 1 A 0 B 0 A 1 B 1 C 0 D 0 C 1 D
1 A 0 A 1 B 0 B 1 C 0 C 1 D 0 D 1 Figure 7.10. Bitonic
merge sort. Thread A compares A 0 and A 1 , and so on. We
generalize the idea of an odd-even transition sort to
develop our block transition sort, which is suited for
architectures like the GPU that have a fast local memory
for a set of threads. Instead of comparing two adjacent
index pairs, it compares two adjacent blocks consisting of
several elements. Precisely, it sorts two adjacent blocks
in a step. Figure 7.9 illustrates how the block transition
sort works. Block transition sort is good for a GPU, which
has fast local memory for each processor, because
partitioning the computation into small problems lets
threads sort two adjacent blocks only on the fast local
memory without writing back to the slower global memory.
Also, the memory-access pattern is preferable, because all
the random access can be done on the chip memory so that
all the read and write operations can be aligned memory
accesses. In our implementation, we used bitonic merge sort
for sorting two adjacent blocks. As shown in Figure 7.10,
bitonic merge sort always compares n/2 sets of entries in a
pass, where n is the total number of elements. So n/2
threads are executed, and each of them reads two elements
to shared memory. Then it repeats comparison and
synchronization until sorting is done. It is important to
set the size of a block such that two blocks can fit in
shared memory. If we have more budget for the sorting, the
two adjacent sorted chunks of data could be merged by using
merge sort to make it much more efficient. 7.4.5
Performance Figure 7.11 shows a simulation that sorts
particle values. A box half-filled with particles is
rotated. To make the effect of sorting illustrative,
particles are colored by their indices. We can see that
these colors do not mix up, although particles are mixed
up. This is because of the renumbering of particles. The
simulation times on a GPU are shown in Figure 7.12. The
figure shows total computation time of a Figure 7.11.
Simulation result with sorting(see Color Plate VI). 0 2 4 6
8 10 12 14 16 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76
81 86 91 96 101 106 111 116 Update (UG) Update (S) Update
(SS) Total (UG) Total (S) Total (SS) Simulation time C o m
p u t a t i o n t i m e s ( m s )

Figure 7.12. Comparison of computation times of simulations


using uniform grid (UG),

sliced grid (S), and sliced grid with sorting (SS) (see
Color Plate VII).

time step and update of the particle values. When sorting


is used for a sliced grid,

particles are sorted at the update. Concretely, particle


indices are sorted using the
grid coordinates as keys. Then updated velocities and
positions are written to the

new memory locations. This timing also includes the time


for sorting. We can see

that total time with sorting does not spike when sorting is
introduced, although

update with sorting does takes some time. 7.5 Data Transfer
Using Grids The sliced grid in which the computation domain
is sliced by the x-axis is used by the acceleration
structure to search for neighboring particles. The data
that have to be sent to an adjacent processor are two
contiguous slices when the side length of voxels equals the
particle diameter. Generally, the data to be sent is
smaller than using a uniform grid although efficiency
depends on the distribution of particles. Sending the data
between GPUs cannot be done directly at the moment.
Therefore, the data have to be sent via main memory in two
steps: first, send the data from a GPU to main memory, then
the second GPU reads the data from main memory. Because the
neighbors of GPUs do not change, the destination of the
memory to which a GPU writes the data and the memory a GPU
reads from is defined at spatial decomposition. Figure 7.13
illustrates how this works when using four GPUs for a
simulation. Each GPU computes a subdomain, and they each
have one or two ghost regions. After the computation of a
time step, all the GPUs send the data to the predefined
location in main memory. GPUs at both ends write particle
data to one buffer and other GPUs write to two buffers. To
make sure that all the data are ready, all of the threads
are synchronized after the send. Then the reading from the
defined memory location finishes the transfer. As you can
see, these threads run completely in parallel except for
one synchronization per time step. Subdomain0 Subdomain0
Processor0 Processor1 Processor2 Processor3 Subdomain1
Subdomain2 Subdomain3 Subdomain1 Subdomain2 Subdomain3 Main
Memory Send Receive Figure 7.13. Overview of a simulation
using four GPUs.

7.6 Results

Our method was implemented using C++ and NVIDIA CUDA 1.1 on
a PC

equipped with an Intel Core2 Q6600 CPU, a GeForce8800GT


GPU, and Tesla

S870 GPUs. The program executes five CPU threads: it uses


one GPU for render

ing and the other four GPUs for the computation of the
particle-based simulation.

A CPU thread managing a GPU executes kernels for the GPU.

Figure 7.14 is a comparison of the computation times of the


simulation shown

in Figure 7.15, changing the number of GPUs. A simulation


using one mil

lion particles takes about 95 ms for a simulation step


using one GPU, while the

same simulation takes about 40 ms and 25 ms on two GPUs and


four GPUs,

respectively. Although these timings include the management


of particles and the

data transfer time between GPUs, they are nearly scaling to


the number of proces

sors. The efficiency of parallelization decreases for the


simulation on four GPUs

compared to the simulation on two GPUs. This is because it


is necessary to com

municate with one other GPU when using two GPUs, but
communication with

two adjacent GPUs is necessary when using four GPUs. The


timing, excluding

the time for data transfer, are also shown in Figure 7.14.
These time only ex

clude the actual data transfer between processors but


include the time to manage

data. From this figure, we can see that the overhead of


data management is small

enough that the performance is scaling well to the number


of GPUs. Number of Particles C o m p u t a t i o n T i m e
s ( m s ) 0 10 20 30 40 50 60 70 80 90 100 0 200,000
400,000 600,000 800,000 1,000,000 1,200,000 1GPU 2GPUs
(excludingTransfer) 2GPUs 4GPUs (excludingTransfer) 4GPUs

[Kipfer and Westermann 05] Peter Kipfer and Ru¨diger


Westermann. “Improved GPU Sorting.” In GPU Gems 2, edited
by Matt Pharr, pp. 733–746. Reading, MA: Addison-Wesley,
2005.

[Koshizuka and Oka 96] S. Koshizuka and Y. Oka.


“Moving-Particle SemiImplicit Method for Fragmentation of
Incompressible Flow.” Nucl. Sci. Eng. 123 (1996), 421–434.

[Mishra 03] B. K. Mishra. “A Review of Computer Simulation


of Tumbling Mills by the Discrete Element Method: Part
I—Contact Mechanics.” International Journal of Mineral
Processing 71:1 (2003), 73–93.
4 - IV - Constraint Solving

transposed. The matrix vector multiplication is Ax.


Quaternions q ∈ H behave

as vectors with respect to addition and scalar


multiplication. What makes them a

useful algebra is the product operation, which we write as


qp. We will use the

right-hand convention for this, as is defined in Chapter 1.


A three-dimensional

vector x can be promoted to a quaternion p = q(x) by


writing p s = 0 and

p v = x. That is a purely imaginary quaternion since p † =


−p. If I forget to

tell you in a particular set of equations, I always use u,


v, and n to denote right

the exact correspondence.

Credit where credit is due. My initial inspiration came


after reading [Ser

ban and Haug 98] and [Haug 89]. I then found results
similar to what is below

in [Tasora and Righettini 99]. The matrix formulation of


quaternion algebra is

already in the graphics literature [Shoemake 91, Shoemake


10] but is not widely

used. There is a whole chapter about details of this matrix


representation in my

PhD thesis [Lacoursie`re 07a] for those who may be


interested.

And now, let’s begin.

9.3 The Problem

Rotational constraints between rigid bodies are problematic


when they are defined
using dot product indicators. This makes them bistable
since obviously x · y = 0

implies that −x · y = 0 as well. Take, for instance, the


rotational part of a

hinge joint between bodies 1 and 2 that have right-handed


orthonormal frames

defined with u (1) ,v (1) ,n (1) and u (2) ,v (2) ,n (2) ,


respectively. Taking n (1) as

the normal axis of the hinge attached on body 1, the hinge


indicator is defined as

the set of the two conditions n (1) · u (2) = 0 and n (1) ·


v (2) = 0. (9.1) Figure 9.1. The hinge definition. When
these are satisfied, the vectorn (1) has no projection in
the u (2) –v (2) plane, as shown in Figure 9.1. The content
of the constraint is that n (1) and n (2) are both normal
to the u (2) –v (2) plane, which means they are parallel,
and thus, by transitivity, n (2) is a normal to the u (1)
–v (1) plane as well. But the indicator function in
Equation (9.1) is satisfied simultaneously for both n (1) =
n (2) and n (1) = −n (2) , i.e., the antiparallel case. But
we usually want the first of the two options. This is shown
below in Figure 9.2. It is possible to flip between one and
the other by wrenching the two bodies hard enough,
irrespective of our numerical method of choice. In
addition, the constraint weakens as it gets further and
further away from the desired configuration. It is, in
fact, metastable when vector n (1) lies in the u (2) –v (2)
plane since the Jacobian vanishes there, and so it might
stabilize either the right way or the wrong way. That makes
them easy to flip since the constraint force starts to
weaken at π/4, and it starts to point the wrong way after
π/2. We could avoid such headaches using reduced coordinate
formulations, as is common in robotics, but that will cause
other types of pain. As an aside, we might think that the
indicator n (1) · n (2) = cos θ = 0, which is a single
equation, is equally good as the two equations in Equation
(9.1). The problem is that this single equation is, in
fact, quadratic, i.e., it behaves as θ 2 near θ = 0, which
means that the Jacobian vanishes. The remedy to that is to
construct indicators with a unique zero, and this can be
done using quaternions. These indicators have extreme
values ±1 precisely when one of the normal vectors used in
the dot product definition is flipped by 180 ◦ . One
problem remains though. The Jacobians still vanish at the
maximum constraint violation, and that means they weaken on
the way there. It is possible to add nonlinear terms to the
indicator functions to fix this problem. But that’s n (1) u
(1) v (1) n (2) u (2) v (2) n (2) u (2) v (2) ¯ ¯ ¯ Good!
Bad! Figure 9.2. Axis flip.

beyond our scope here, and I think we can manage better


with good logic code to

catch the problem cases.

The theory below is an overkill, but the results are easy


to implement and not

much more expensive computationally than the standard dot


product versions.

Three constraints are analyzed in detail, namely, the lock


joint, the hinge joint,

and the homokinetic joint. This last one is also known as


the constant velocity

joint, CV for short. It is much like the Hooke or universal


joint but without the

problems. The Hooke joint is easy to define as a bistable


constraint in dot prod

uct form. It seems that it is not possible to define a


monostable version without

introducing a third body that is hinged to the other two.


If we look at a good

diagram and animations of the Hooke joint [Wikipedia 10b],


we will see clearly

why a third body is needed. But more to the point, the CV


joint is the one we see

in our front traction cars, since otherwise, the wheels


would not move at constant

rotational velocity. Curiously, though it is an engineering


puzzle to construct a

CV joint [Wikipedia 10a] that is not fragile, it is dead


easy to define the geome

try using quaternions. A homokinetic joint can be


constructed using two hinges,

and this makes the analysis much more complicated [Masarati


and Morandini 08]

than the quaternion definition given below.

These three rotational joints are used in combination with


positional con

straints to produce all other joints, namely, the “real”


hinge, the prismatic of the

sliding joint that requires the full lock constraint, the


cylindrical joint that requires

the hinge constraint, etc. A robust Hooke joint can also be


built out of three bodies

using two hinges.

In what follows, I will first explain the indicators


themselves by looking at

special quaternions and the geometry of the resulting


kinematics. Then, I will

explain how to construct the Jacobians for these.

9.4 Constraint Definitions

It is enough to consider just one quaternion q describing


the orientation of one

rigid body with respect to the inertial frame to start


with. This is because, in the

end, the quaternion used in the constraint will be the


relative rotation going from

body 1 to body 2. That will simplify things and save our


time. Also note that in

this first stage, I assume that both our hinge and CV axes
are aligned along z in

each body. Generalizations are provided below.

The quaternion that corresponds to no rotation at all is


just the unit quater
nion, i.e., q s = 1, q v T = [0, 0, 0] T . The indicator is
easy to define here: c lock = q v = Pq = P lock q = 0,
(9.2) where P = P lock is the projection operator P = ⎡ ⎣ 0
1 0 0 0 0 1 0 0 0 0 1 ⎤ ⎦ so that Pq = q v . There is still
an ambiguity since the constraint is satisfied by both ±q.
But that is of no consequence since both cases correspond
to a unit rotation. Remember that quaternions cover the
rotation group twice. The lock constraint is thus a simple
linear projection of the relative quaternion. That will
hold for all the other constraints. The hinge constraint
requires that the original and transformed frame share a
common axis. This is set to the axis z arbitrarily, and
thus the allowed rotations have the form q s = cos(φ/2) and
q v = [0, 0, sin(φ/2)] T , (9.3) which gives the two
equations we want: c hinge = [ x · q v y · q v ] = [ x T y
T ] Pq = P hinge q = [ 0 0 ] , (9.4) where P hinge = [ x T
y T ] P = [ 0 1 0 0 0 0 1 0 ] is the hinge projection
operator. We’ll see in Section 9.8 how to define this for
axes other than z. And now comes the CV joint. The
kinematic constraint we want to create here is such that
the rotational motion along the axis n (1) of an object
produces an identical rotation about the axis n (2) of
another. That is precisely the relationship between the
plate of a turntable and the disc sitting on it, although
these two objects share the same longitudinal plane. But
the idea is the same: we want a driver that produces a
constant rotational velocity in a secondary body about some
axis fixed in that body. Let’s now visualize a perfect CV
joint using two pens with longitudinal axes n (1) and n (2)
, respectively, each with a longitudinal reference line
drawn on the θ′ θ ø (1) ø (2) Figure 9.3. An illustration
of the CV coupling.

circumference. Hold the pens 1 and 2 in your left and right


hands, respectively,

and align the axes and the reference lines so that they
face up. Now, rotate pen 2

by some angle θ about the vertical axis z away from you.


Choosing θ ≈ 45 ◦ will

make things obvious. The two pens lie in the horizontal


plane, with an angle θ

between n (1) and n (2) . Now, realign the two pens and
rotate them about their

common longitudinal axes by 90 ◦ . Keep the reference lines


aligned but make

them face you. Then rotate pen 2 by the same angle θ as


before about the axis

z. Clearly, the axis of rotation r is still perpendicular


to n (1) but is not the same

as before. If you had done this in small increments, you


would have seen the CV

joint at work. You would probably scratch your head


wondering how you would

actually construct something that worked like that. You can


even change the angle

θ as you move along, keeping perfect alignment between the


reference lines. One

thing is constant though: relative rotation between pens 1


and 2, as seen from

pen 1, is about an axis r that is perpendicular to n (1) .


This axis r is not fixed,

however. This is what I’ve sketched in Figure 9.3.

Let’s get rid of all the indices now. The conclusion from
the experiment above

is that a rotation by any angle θ about any axis r such


that r·z = 0 always, will not

rotate the transformedx ′ –y ′ plane about the transformed


axis z ′ . Mathematically,

this implies that the relative quaternion q satisfies q s =


cos(θ/2) and q v = sin(θ/2)r, where r · z = 0. (9.5)

Therefore, c CV = z · q v = z T Pq = P CV q = 0, (9.6)

where P CV = z T P = [0, 0, 0, 1]. Now that we have


constraint definitions, we need Jacobians. But to get that
right, I need to tell you a bit more about how I manipulate
quaternion expressions. 9.5 Matrix-Based Quaternion Algebra
The quaternion algebra is covered in Chapter 1, so this
section is just a simple translation into language I find
useful. The format I use here should help you implement
what is described in this chapter. First, note that any
quaternion q,p ∈ H can be represented as a simple
fourdimensional vector. That works for addition and
subtraction, obviously. The only thing needed to make the
correspondence complete is to define the quaternion product
in terms of matrix-vector operations, as I do now. Since
the quaternion product of q,p ∈ H is linear in both q,p ∈ R
4 , we can write it as the matrixvector product qp = Q(q)p
= P(p)q, corresponding to the right and left products,
respectively, with the definitions Q(q) = ⎡ ⎢ ⎢ ⎣ q s −q 1
−q 2 −q 3 q 1 q s −q 3 q 2 q 2 q 3 q s −q 1 q 3 −q 2 q 1 q
s ⎤ ⎥ ⎥ ⎦ = [ q s −q T v q v q s I 3 + [q v ] × ] = [ q G T
(q) ] , P(q) = ⎡ ⎢ ⎢ ⎣ q s −q 1 −q 2 −q 3 q 1 q s q 3 −q 2
q 2 −q 3 q s q 1 q 3 q 2 −q 1 q s ⎤ ⎥ ⎥ ⎦ = [ q s −q T v q
v q s I 3 − [q v ] × ] = [ q E T (q) ] , where G(q) = [ −q
v q s I 3 − [q v ] × ] , E(q) = [ −q v q s I 3 + [q v ] × ]
, [x] × = ⎡ ⎣ 0 −x 3 x 2 x 3 0 −x 1 −x 2 x 1 0 ⎤ ⎦ . For
the last definition, this means that [x] × y = x × y. For
completeness, the complex conjugation matrix is C = ⎡ ⎢ ⎢ ⎣
1 0 0 0 0 −1 0 0 0 0 −1 0 0 0 0 −1 ⎤ ⎥ ⎥ ⎦ = [ 1 0 0 −I ] ,
so q † = Cq.

The correspondence to the quaternion algebra is then


Q(p)Q(q) = Q(pq) and P(p)P(q) = P(qp),

as well as Q(q † ) = Q(q) T and P(q † ) = P(q) T .

The two matrices Q(q) and P(q) also commute so that


Q(p)P(q) = P(q)Q(p),

as we can easily verify. This representation makes it easy


to compute the Jacobian

matrices related to quaternion constraints.

We need an expression for q˙q † for unit quaternions 〈〈q〉〉


= 1, which are the

ones corresponding to orthonormal transforms, i.e.,


rotations. That will connect

the changes in the relative quaternions to the angular


velocities of the connected

rigid bodies. Since q † q = 1 always, we have d dt ( q † q


) = 0 = q˙q † + qq˙ † = q˙q † + ( q˙q † ) † ,

and therefore w = 1 2 q˙q † = −w †

is purely imaginary and so w = [ 0 ω ] = P T ω,


where ω ∈ R 3 is the angular velocity expressed in the
inertial frame. So now, we

have q † q˙ = Q(q) T q˙ = 1 2 P T ω = 1 2 G T (q)ω and q˙ †


q = P(q)q˙ † = − 1 2 P T ω = − 1 2 E T (q)ω. (9.7)

These identities are usually summarized as q˙ = 1 2 qw.


(9.8)

Note that the definition in Equation (9.8) is sometimes


written the other way

round, as is the case when defining the angular velocity


vector in the body frame

or when using left-handed multiplication, which is often


used in three-dimensional

graphics. Beware. 9.6 A New Take on Quaternion-Based


Constraints The Jacobians of any quaternion-based
constraint can be computed using just one master Jacobian
matrix and various projections. This is done using the
matrix representation described in Section 9.5. Consider
two rigid bodies with quaternions r, s ∈ H and angular
velocities ω (1) and ω (2) , respectively. The definition
of the relative quaternion is q ∈ H. The first task is to
relate the rate of change of q to the angular velocities ω
(1) and ω (2) . The time derivative of q is q˙ = r˙ † s+ r
† s˙ = (r˙ † r)r † s+ r † s(s † s˙) = (r˙ † r)q + q(s †
s˙). (9.9) So now, using the matrix representation of the
quaternion product, taking the left product using P(q) on
the first term and the right product using Q(q) on the
second, and substituting the identities in Equation (9.7),
we have q˙ = − 1 2 P(q)P T ω (1) + 1 2 Q(q)P T ω (2) = − 1
2 E(q) T ω (1) + 1 2 G(q) T ω (2) . (9.10) The only
Jacobians you need for all three quaternion constraints
defined here are these. It might seem that we took a very
long detour to arrive at Equation (9.10), which is very
simple since we just need matrices E(q) and G(q) in the
end. Looking at the indicators defined above in Equations
(9.2), (9.4), and (9.6), the different Jacobians are simply
different projections of the same proto-Jacobian, namely, G
(1) lock = − 1 2 PE(q) T , G (2) lock = 1 2 PG(q) T , G (1)
hinge = − 1 2 P hinge E(q) T , G (2) hinge = 1 2 P hinge
G(q) T , (9.11) G (1) CV = − 1 2 P CV E(q) T , G (2) CV = 1
2 P CV G(q) T . 9.7 Why It Works The dot product
representation of the indicators for rotational constraints
is as follows: c dlock = ⎡ ⎣ n (1) · u (2) n (1) · v (2) u
(1) · n (2) ⎤ ⎦ = ⎡ ⎣ 0 0 0 ⎤ ⎦ , c dhinge = [ u (1) · n
(2) v (1) · n (2) ] = [ 0 0 ] , c dhooke = u (1) · v (2) =
0.

We use the Hooke joint here for rough comparison since it


is not practical to define

the CV joint with dot products. Now, choose body 2 to be


the universe and rotate

body 1 about u (2) by π so both the new v (2) and n (2)


axes have reversed signs.

Clearly, all three constraints are now violated


geometrically, despite the fact that

the indicator functions are still 0.

This is not the case with the quaternion-based constraints


defined in Equa

tions (9.2) and (9.4) since for a rotation that flips the
axis z by 180 ◦ —q (2) =

[0, 1, 0, 0] T , say—the indicators are then c lock = [1,


0, 0] T and c hinge = [1, 0] T ,

respectively. For the CV joint, the rotation that flips the


axis x corresponds to

q = [cos(π/2), 0, 0, sin(π/2)] T = [0, 0, 0, 1] T , giving


c CV = 1. These are all

maximum violation given that all constraints correspond to


components of unit

quaternions. Thus, the Jacobians at these points are then G


(2) dlock = ⎡ ⎣ 0 0 0 0 0 −1 0 1 0 ⎤ ⎦ , G (2) dhinge = [ 0
0 −1 0 0 0 ] , G (2) dhk = [ 0 0 0 ] ,

respectively, and so the restoration force vanishes at


maximum violation. Since

the Jacobians have full row rank when the constraints are
satisfied, some of the

rows must decrease gradually on the path to maximal


constraint violation and so

the constraint weakens. This problem can be addressed by


adding nonlinear terms
in the constraint definitions. That’s beyond the present
scope, however.

9.8 More General Frames

Of course, we may not always have hinge joints that align


the axis z of body 1

with the axis z of body 2. Changing that is quite easy to


do in the dot product

version, but there are a few additional tricks for the


quaternion counterpart, as I

now show.

Assume now that the body-fixed reference frames in which


the joints are de

fined have quaternions e, f ∈ H, respectively. Figure 9.4


demonstrates the situa

tion for body 1 and transform e. x n (1) u (1) v (1) z y n


(1) u (1) v (1) z → n (1) x → u (1) y → v (1) e Figure 9.4.
Attachment frames. The quaternions that map vectors defined
in these frames to the global frames are then in world
frame, so we have re and sf , respectively. This changes
the definition of the relative quaternion in Equation (9.9)
to p = e † r † sf . Following the steps in Equations (9.9)
and (9.10), we get p˙ = e † q˙f = P(f)Q(e) T q˙. Everything
else follows. To define a hinge joint, for instance, we can
either specify the quaternion transforms e and f directly
or provide a hinge frame containing at least the axis of
rotation in world coordinates. If we have a full frame of
reference for the hinge definition, it is possible to
define the reference joint angle also. Otherwise, the
orthogonal complement of the axis must be computed and the
quaternions e, f extracted from the frame. Once we have a
full frame defining the hinge geometry in world coordinates
with three orthogonal axes, u,v,n, forming an orthonormal
basis in which n is the axis of rotation, we build the
matrix R = [ u v n ]

and extract the quaternion t from it using well-known


techniques [Shoemake 10].

Once you have that, you compute e = r † t and f = s † t,


(9.12)

where r and s are the orientation quaternions of body 1 and


2, respectively.

For the CV joint, the axis of rotation may be different in


each body. For that

case, we need two axes or two frames, as before. A full


frame helps to define

the zero reference, as for the hinge case. The computations


are the same as in

Equation (9.12).

Putting everything together, we can now define the general


constraints and

constraint Jacobians in a unified way using three different


projection operators P

acting on the relative quaternion q. The meta definition is


this: c(x) = Pq, G (1) = − 1 2 PE T (q), G (2) = 1 2 PG T
(q).

In turn, the different constraints have the following


projection operators: P lock = P, P hinge = [ x T y T ]
PP(f)Q(e), P CV = z T PP(f)Q(e). (9.13)

These projection matrices need to be computed only once,


unless we have limits

and drivers, as I explain in the next section.

9.9 Limits and Drivers

The hinge joint leaves one degree of freedom. Good or bad,


even this freedom

is sometimes taken away with joint limits, locks, or


drivers. Going back to the

definitions in Equations (9.4) and (9.3), we can compute


the angle from θ = 2 atan(q s /q 3 ).

This is now a scalar function of the vector argument q, θ =


2f(g(q)), and we can follow the chain rule to get θ˙ = f ′
∇gq˙ and then expand q˙. First, observe that f ′ = q 2 s
/(q 2 s +q 2 3 ) ≈ q 2 s near constraint satisfaction, so
that one is easy. For the rest, we have ∇(q 3 /q s ) = 1 q
2 s [ −q 3 0 0 q s ] . When all is said and done, we have
to add an additional row to the projection operator in
Equation (9.13): P hingec = ⎡ ⎣ 0 1 0 0 0 0 1 0 −q 3 0 0 q
s ⎤ ⎦ P(f)Q(e). The subscript “hingec” now stands for
controlled hinge. The case for the CV joint is similar.
Start from the definition of the polar angle θ = 2 atan(〈〈q
v 〉〉/q s ) using Equation (9.5). The chain rule essentially
provides the same results as before, namely, p T = ∇(〈〈q v
〉〉/q s ) = [ −〈〈q v 〉〉 q s 〈〈q v 〉〉 q T v ] , and so, as in
the case of the hinge constraint, the control part augments
the projection defined in Equation (9.13) to P CVc = [ P CV
p T ] , where P CV = [ 0 0 0 1 ] T , (9.14) as before in
Equation (9.11). And now we are all set to control anything
we like, or almost anything. 9.10 Examples What follows are
simple illustrations of the constraints in action. One
single rigid body is attached to the inertial frame
following the logic explained in the main text, i.e., only
the relative quaternion is of relevance. −4 −3 −2 −1 0

l o

1 0

( 〈 〈

( q

) 〉 〉

) Hinge Stabilization θ 0 = 0.45 1 2 3 Time quat: 〈〈c〉〉


quat: q 0 dot: 〈〈c〉〉 dot: q 0 −4 −3 −2 −1 0

l o

1 0

( 〈 〈

( q

) 〉 〉

) Hinge Stabilization θ 0 = 0.55 1 2 3 Time quat: 〈〈c〉〉


quat: q 0 dot: 〈〈c〉〉 dot: q 0
Figure 9.5. The hinge joint defined using either
quaternions (top) or dot constraints

(bottom). −4 −3 −2 −1 0 l o g 1 0 ( 〈 〈 g ( q ) 〉 〉 ) Hooke
Stabilization θ 0 = 0.45 1 2 3 Time quat: 〈〈c〉〉 quat: q 0
dot: 〈〈c〉〉 dot: q 0 −4 −3 −2 −1 0 l o g 1 0 ( 〈 〈 g ( q ) 〉
〉 ) Hooke Stabilization θ 0 = 0.55 1 2 3 Time quat: 〈〈c〉〉
quat: q 0 dot: 〈〈c〉〉 dot: q 0 Figure 9.6. The CV joint is
used here for the quaternion formulation (top), and the
Hooke joint is used for the dot product one (bottom). −4 −3
−2 −1 0

l o

1 0

( 〈 〈

( q

) 〉 〉

) Lock Stabilization θ 0 = 0.45 1 2 3 Time quat: 〈〈c〉〉


quat: q 0 dot: 〈〈c〉〉 dot: q 0 −4 −3 −2 −1 0

l o

1 0

( 〈 〈

( q

) 〉 〉

) Lock Stabilization θ 0 = 0.55 1 2 3 Time quat: 〈〈c〉〉


quat: q 0 dot: 〈〈c〉〉 dot: q 0

Figure 9.7. A lock joint simulated using either the


quaternion (top) or the dot product
(bottom) formulation. Starting at nearly 90 ◦ from the
vertical, both constraints relax to

the correct position, in which q s = 1. When the initial


angle is slightly over 90 ◦ , the

quaternion formulation finds its way back to the correct


configuration, but the dot product

version goes the wrong way, stabilizing at the wrong zero


of the indicator. 0 3 6 A n g l e φ 1 2 3 Time Phase
Difference for Quaternion CV Joint Quaternion (CV) angle
Dot product (Hooke) angle Reference Figure 9.8. Constraint
violation and phase difference between input driver and
driven body. This is done for a moderate joint angle of 5 ◦
. Both constraint definitions introduce only a small phase
difference. 0 3 6 A n g l e φ 1 2 3 Time Phase Difference
for Quaternion CV Joint Quaternion (CV) angle Dot product
(Hooke) angle Reference Figure 9.9. Constraint violation
and phase difference between input driver and driven body.
Here, the angle is more pronounced at 20 ◦ . The result is
that the CV joint does still follow the driver with a small
phase difference. The Hooke joint deviates significantly
from the input driver.

0.6

0.8 1

u t

p u

t V

e l

o c

i t y Output velocity at φ = 5, driver at ω = π rads/sec. 1


2 3 4 Time Quaternion, CV Dot product, Hooke

0.6

0.8 1

O
u t

p u

t V

e l

o c

i t y Output velocity at φ = 20, driver at ω = π rads/sec.


1 2 3 4 Time Quaternion, CV Dot product, Hooke

Figure 9.10. These two graphs illustrate more precisely the


ratio of the output angular

velocity to the driver for the quaternion (top) and the dot
product (bottom) formulations.

[Lacoursie`re 07a] Claude Lacoursie`re. “Ghosts and


Machines: Regularized Variational Methods for Interactive
Simulations of Multibodies with Dry Frictional Contacts.”
PhD thesis, Department of Computing Science, Umea˚
University, 2007.

[Lacoursie`re 07b] Claude Lacoursie`re. “Regularized,


Stabilized, Variational Methods for Multibodies.” In The
48th Scandinavian Conference on Simulation and Modeling
(SIMS 2007), 30–31 October, 2007, Go¨teborg (Sa¨ro¨),
Sweden, edited by Peter Bunus, Dag Fritzson, and Claus
Fu¨hrer, pp. 40–48. Linko¨ping: Linko¨ping University
Electronic Press, 2007.

[Masarati and Morandini 08] Pierrangelo Masarati and Marco


Morandini. “An Ideal Homokinetic Joint Formulation for
General-Purpose Multibody RealTime Simulation.” Multibody
Syst Dyn 20 (2008), 251–270.

[Serban and Haug 98] R. Serban and E. J. Haug. “Kinematic


and Kinetic Derivatives in Multibody System Analysis.”
Mechanics Structures Machines 26:2 (1998), 145–173.

[Shoemake 91] Ken Shoemake. “Quaternions and 4 × 4


Matrices.” In Graphics Gem 2, edited by Jim Arvo, pp.
352–354. San Francisco: Morgan Kaufmann, 1991.

[Shoemake 10] Ken Shoemake. “Quaterions.” Unknown.


Available at ftp://ftp.
cis.upenn.edu/pub/graphics/shoemake/quatut.ps.Z, accessed
June 12, 2010.
[Tasora and Righettini 99] Alessandro Tasora and Paolo
Righettini. “Application of the Quaternion Algebra to the
Efficient Computation of Jacobians for Holonomic
Rheonomic-Constraints.” In Proc. of the EUROMECH
Colloquium: Advances in Computational Multibody Dynamics,
edited by Jorge A. C. Ambro´sio and Werner O. Schielen,
IDMEC/IST Euromech Colloquium 404, pp. 75–92. Lisbon:
European Mechanics Society, 1999.

[Wikipedia 10a] Wikipedia. “Constant-Velocity Joint.” 2010.


Wikipedia. Available at
http://en.wikipedia.org/w/index.php?title=Constant-velocity
joint&oldid=351128343, accessed April 27, 2010.

[Wikipedia 10b] Wikipedia. “Universal Joint.” 2010.


Wikipedia. Available at
http://en.wikipedia.org/w/index.php?title=Universal
joint&oldid= 356595941, accessed April 27, 2010.
5 - V - Soft Body

[Etzmuss et al. 03] O. Etzmuss, M. Keckeisen, and W.


Strasser. “A Fast Finite Element Solution for Cloth
Modelling.” In Proceedings of 11th Pacific Conference on
Computer Graphics and Applications, pp. 244–251.
Washington, DC: IEEE Computer Society, 2003.

[Garcia et al. 06] M. Garcia, C. Mendoza, A. Rodriguez, and


L. Pastor. “Optimized Linear FEM for Modeling Deformable
Objects.” Comput. Animat. Virtual Worlds 17: 3–4 (2006),
393–402.

[Muller and Gross 04] M. Muller and M. Gross. “Interactive


Virtual Materials.” In Proceedings of Graphics Interface
2004, pp. 239–246. Waterloo, Ontario: Canadian
Human-Computer Communications Society, 2004.

[Muller and Teschner 03] M. Muller and M. Teschner.


“Volumetric Meshes for Real-Time Medical Simulations.” In
Bildverarbeitung fu¨r die Medizin 2003, CEUR Workshop
Proceedings, 80, pp. 279–283. Aachen, Germany: CEURWS.org,
2003.

[Press et al. 07] William H. Press, Saul A. Teukolsky,


William T. Vetterling, and Brian P. Flannery. Numerical
Recipes: The Art of Sientific Computing. New York:
Cambridge University Press, 2007. [Shewchuk 94] Jonathan R.
Shewchuk. “An Introduction to the Conjugate Gradient Method
Without the Agonizing Pain.” 1994. Available at http://www.
cs.cmu.edu/ ∼ quake-papers/painless-conjugate-gradient.pdf.
[Si 09] Hang Si. “TetGen: A Quality Tetrahedral Mesh
Generator and a 3D Delaunay Triangulator.” 2009. Available
at http://tetgen.berlios.de/. [Spillman et al. 06] J.
Spillman, M. Becker, and M. Teschner. “Robust Tetrahedral
Meshing of Triangle Soups.” In Vision, Modeling, and
Visualizaton 2006, pp. 9–16. Berlin: Akademische
Verlagsgesellschaft, 2006. [Tinney and Walker 67] W. F.
Tinney and J. W. Walker. “Direct Solutions of Sparse
Network Equations by Optimally Ordered Triangular
Factorization.” Proceedings of IEEE 55:11 (1967),
1801–1809. 11 Particle-Based Simulation Using Verlet
Integration Thomas Jakobsen 11.1 Introduction This pearl
explains a technique that I developed in 1999 for the
Hitman game series to simulate falling (and usually very
dead) people, a method of animation now colloquially known
as the ragdoll effect. The algorithm is also useful for
simulating cloth, hair, rigid objects, soft bodies, etc. At
the heart of the algorithm lies the so-called Verlet 1
technique for numerical integration coupled with a
particle-based body representation and the use of
relaxation to solve systems of equations. Together with a
nice square root optimization, the combined method has
several advantages, most notably speed of execution,
stability, and simplicity. While today’s much-faster
hardware platforms allow for more advanced and more
realistic approaches to physics simulation, there are still
situations where a particle-based Verlet approach, like the
one presented here, is preferable, either due to speed of
execution or because of its simplicity. Verlet-based
schemes are especially useful for real-time cloth
simulation, for use on low-spec hardware, for
two-dimensional games, and as an easy introduction to the
world of physically based animation. The mathematics behind
the technique is fairly easy to understand, and once you
reach the limits of the technique, the underlying ideas of
semi-implicit integration and relaxation carry over to more
advanced state representations, constraints, and
interactions. As such, Verlet integration is not only a
good starting point for the beginner, but it also forms the
basis for physics simulation in many existing commercial
games, and it is a good stepping-stone to more advanced
approaches. 1 French, pronounced with a silent t: [veK’le].
251

11.1.1 Background

Hitman: Codename 47 was one of the very first games to


feature articulate rag

dolls, and as such, the physics simulation ran on much


slower hardware than

what is common today. I was assigned the task of developing


the physics sys

tem for Hitman, and I threw myself at the various methods


for physically based

animation that were popular at that time. Most of these,


however, either suffered

from elastical-looking behavior originating from the use of


penalty-based

schemes or they had very bad real-time performance for


various different

reasons.
At some point I remembered the old “demo scene” effect for
simulating rip

ples in water that always had me fascinated. It had all the


nice features I was

looking for, including stability and speed of execution.


Except it simulated water,

neither cloth nor hard nor soft bodies. It relied on a


velocity-less representation

of the system state by using the previous position of the


water surface to update

the current one. What I came up with for Hitman was a


technique that also fea

tures a velocity-less representation of the system state,


yielding a high amount

of stability. As it turned out, almost the same technique


had been used for years

to simulate molecular dynamics (under names such as SHAKE


and RATTLE,

see [Forester and Smith 98]).

I will now continue with a short review of existing methods


for numerical

integration, explaining their differences and drawbacks,


with a focus on semi

implicit methods and Verlet integration. The remainder of


the chapter explains

how to apply the Verlet method to interactive physics


simulation and goes through

some of the related subtleties.

11.2 Techniques for Numerical Integration

For our purposes, the subject of numerical integration


deals with how to advance

a simulation from one time step to the next, updating the


system state by solving
an underlying ordinary differential equation (ODE). An
introduction to numerical

integration has already been given in Chapter 1; please


refer to this for additional

details.

11.2.1 Forward Euler Integration

When experimenting with cloth simulation for the first


time, many developers

choose a basic Euler integration as their initial method


for time stepping a mass

spring system. But we realized pretty quickly that the


technique is far from suf

ficient: cloth tends to vibrate and even “explode” when


moved around too much. The thing is, basic (forward) Euler
integration has a hard time dealing with stiff springs.
This is the major drawback of forward Euler integration,
and often a showstopper. The problem is that particle
positions and velocities come out of sync when time steps
are too large. This in turn leads to instabilities, which
lead to pain and suffering. 11.2.2 Backward Euler
Integration A way to make up for this is to use implicit
integration. The method of backward Euler integration
belongs to the family of implicit-integration methods. The
members of this family all provide more stability in
situations with stiff equations and generally let us use
larger time steps without the risk of the system blowing
up. With backward Euler integration, we update the current
position, not with the current velocity and acceleration
vectors (as was the case with basic Euler integration) but
with the resulting velocity vector v(t + Δt) and the
resulting acceleration vector a(t+Δt). The problem with
this approach, however, is that the acceleration and
velocity at time t + Δt are unknown, and therein lies the
problem with backward Euler integration: as we cannot
directly evaluate the update velocity and acceleration, we
need to solve for the unknowns. The resulting set of
equations can be rather large if there are many particles.
This calls for (usually slow) numerical methods for solving
equations. This means that in their basic forms, neither
backward nor forward Euler integration is immediately
useful for our purpose. 11.2.3 Other Approaches
Experimenting with other approaches, such as adaptive
integration or higher-order integration methods such as
Runge-Kutta, may bring you closer to the desired result,
but these methods, too, are not ideal choices for
real-time, interactive use for the same reasons: they are
basically either slow or unstable. So what does an
intelligent game physics programmer do? It seems we cannot
escape either having to deal with instability or too much
elasticity in the case of explicit integration methods or
being forced to solve unwieldy systems of equations in the
case of implicit integration methods—or, alternatively,
waiting an eternity for adaptive methods to finish.
Luckily, as it turns out, we can have the best of two
worlds. The so-called semi-implicit methods (also known as
semi-explicit methods) are both simple and stable. And
while we may lose some accuracy in some cases, it doesn’t
really matter in the case of game simulation. Who cares if
the dead body flies ten percent too far or too short? We’re
not sending a (real) rocket to the moon. On the other

hand, we do care about visual quality and stability, and we


do care whether our

software runs fast or slow—and semi-implicit methods are


usually fast.

The semi-implicit version of Euler integration goes like


this: v(t + Δt) = v(t) + aΔt, x(t + Δt) = x(t) + v(t +
Δt)Δt.

By substitution, the second equation is equivalent to x(t)


= x(t−Δt)+v(t)Δt,

a fact that together with the above leads us to the Verlet


formulation.

11.2.4 Verlet Integration

As mentioned, Verlet integration is an example of a


semi-implicit integration

method. The Verlet integration update step is just a


reformulation of the expres

sions given in the previous subsection: x(t + Δt) = 2x(t)−


x(t−Δt) + a(t)(Δt) 2 .

Instead of storing the particles’ positions x and


velocities v as before, it suffices

to store the current position x(t) and the previous


position x(t −Δt). Velocities

can then be calculated on the fly from x(t) and x(t−Δt) (if
needed at all): v(t + Δt) = (x(t + Δt)− x(t))/Δt.

This relation means that positions and velocities are


always in sync.

Since x(t)−x(t−Δt) is just the change in position from the


last time frame,

the Verlet formula can be interpreted as follows:

1. Add to the current position the distance we just moved


in the previous time step.

2. Adjust the position to account for gravity and other


forces.

3. The new velocity is directly proportional to the total


step we just moved.

Be aware that because velocity is now given only implicitly


by using the previous

positions of the particles, the time step needs to be kept


constant between each

call to the numerical integrator. While it is possible to


develop formulas that take

changing time steps into account, in my experience, the


best way to handle larger

time steps is to simply call the integrator multiple times.


Velocity Verlet integration. A variant of Verlet
integration that is sometimes used is the velocity Verlet
algorithm (also called Leapfrog integration): x(t + Δt) =
x(t) + v(t)Δt + a(t)(Δt) 2 /2, v(t + Δt/2) = v(t) +
a(t)Δt/2, a(t + Δt) = f(x(t + Δt),v(t + Δt/2)), v(t + Δt) =
v(t + Δt/2) + a(t + Δt)Δt/2, where f is a function of
position and velocity that yields the acceleration given by
the current context. The physics engine in Hitman relies on
basic Verlet integration only, but in situations that call
for higher accuracy or additional robustness, velocity
Verlet integration may sometimes be more suitable. Using
Verlet integration in a physics simulation. It is easy to
implement the results of the above section in a function
that updates a set of unconstrained particles. // Use
Verlet integration to advance an array of particles // t:
Size of time step // x: Array of current positions of
particles // x prev: Array of previous positions of
particles // a: Current acceleration of each particle // n:
Total number of particle coordinates // void
VerletTimeStep(double t, double∗ x, double∗ x prev, double∗
a, int n) { double x old; for(int i=0; i<n; i++) { x old =
∗x; ∗x = 1.99 ∗ ∗x − 0.99 ∗ ∗x prev + ∗a ∗ t ∗ t; ∗x prev =
x old; x++; x prev++; a++; } } The above code has been
written for clarity, not speed. Note that it is possible to
save memory transfers with a double-buffering approach by
alternating between two arrays. Note also that the Verlet
formula has been changed slightly to include the two
factors 1.99 and 0.99 in order to introduce a small amount
of drag in the system for further stabilization.

11.3 Using Relaxation to Solve Systems of Equations

Verlet integration in itself, as described above, provides


a good foundation for,

say, an unconstrained particle system. But how do we go


about handling more

complex restrictions or constraints on the movements of the


particles? How

should interconnected particles be handled, for example?


And how do we keep

particles from penetrating a surface? As for the latter, we


choose to simply project

offending particles out of obstacles. By projection,


loosely speaking, we mean

moving the point as little as possible until it is free of


the obstacle. Normally, this

means moving the point perpendicularly out towards the


collision surface.

11.3.1 Handling Collisions and Penetrations

Let’s look at a simple example. Assume that our world is


the inside of the cube

(0, 0, 0)− (1000, 1000, 1000) and assume furthermore that


the particles’ restitu

tion coefficient is zero (that is, particles do not bounce


off surfaces when collid

ing). To keep all particle positions inside the valid


interval, the corresponding

projection code would be as follows:

// Keeps particles in a box

void SatisfyBoxConstraints(double∗ x, int n) {

for(int i=0; i<n; i++) { // For all particle coordinates ∗x


= min(max(∗x, 0.0), 1000.0); x++;

This keeps all particle positions inside the cube and


handles both collisions and

resting contact. The beauty of the Verlet integration


scheme is that the cor

responding changes in velocity are handled automatically.


Thus, after calling

SatisfyBoxConstraints() and VerletTimeStep() a number of


times,

the velocity vector will contain no components in the


normal direction of the sur

face (corresponding to a restitution coefficient of zero).


The update loop is then:

void UpdateLoop() {

VerletTimeStep();

SatisfyBoxConstraints();

Try it out—there is no need to directly cancel the velocity


in the normal direc

tion. While the above might seem somewhat trivial when


looking at particles, the
strength of the Verlet integration scheme is now beginning
to shine through and should really become apparent when
introducing constraints and coupled rigid bodies in a
moment. 11.3.2 Handling Constraints We now describe by
example how more complex constraints can be implemented.
Assume that we have two particles that we wish to keep at a
fixed distance from each other, in effect simulating a
stick. Just as in the above case, where collisions were
handled by projecting the particles in question out of the
offending obstacles, we carry out a similar procedure here:
if a particle invalidates a constraint after the Verlet
time step routine has been called, we simply move the
particle by as little as possible in order to satisfy the
constraint once again. In the case of the stick this means
pulling or pushing the particles directly towards or away
from each other (depending on whether their distance is too
large or too small; see Figure 11.1). For each pair of
constrained particle positions x ∗ i and x ∗ j , the
following calculations must be carried out: d = x j − x i ,
(11.1) u = ( r ||d|| − 1.0 ) d, (11.2) x ∗ i = x i − 1 2 u,
(11.3) x ∗ j = x j + 1 2 u, (11.4) where r is the rest
length of the stick and u is the missing displacement
between the two particles. Assume now that we also want the
particles to satisfy the cube constraints discussed in the
previous subsection. By running the above code to fix the
stick Distance too large Correct distance Distance too
small Figure 11.1. Moving the particles to fix an invalid
distance.

constraint, however, we may have invalidated one or more of


the cube constraints

by pushing a particle out of the cube. This situation can


be remedied by immedi

ately projecting the offending particle’s position back


onto the cube surface once

more—but then we end up invalidating the stick constraint


once again.

Really, what we should do is solve for all constraints at


once, both the box and

the stick constraints. This would be a matter of solving a


system of equations. But

instead of explicitly forming the system and solving it


with a separate algorithm
for solving systems of equations, we choose to do it
indirectly by local iteration.

We simply repeat the two pieces of code a number of times


after each other in the

hope that the result is useful. This yields the following


code:

void TimeStep StickInBox()

VerletTimeStep();

while(notConverged) { SatisfyBoxConstraints()
SatisfyStickConstraints()

While this approach of pure repetition might appear


somewhat naive, it turns out

that it actually converges to the solution that we are


looking for! The method is

called relaxation (or Jacobi or Gauss-Seidel iteration


depending on how you do it

exactly, see [Press et al. 92]). It works by consecutively


satisfying various local

constraints and then repeating; if the conditions are


right, this will converge to

a global configuration that satisfies all constraints at


the same time. It is useful

in many other situations where several interdependent


constraints must hold si

multaneously. As a general algorithm for solving equations,


the method doesn’t

converge as fast as other approaches do, but for


interactive physics simulation it

is often an excellent choice.


We get the following overall simulation algorithm (in
pseudocode):

void TimeStep()

VerletTimeStep();

// Relaxation step

iterate until convergence { for each constraint (incl.


collisions) { satisfy constraint }

} The number of necessary iterations varies depending on


the physical system simulated and the amount of motion. The
relaxation can be made adaptive by measuring the change
from the last iteration. If we stop the iterations early,
the result might not end up being quite valid but because
of the Verlet scheme, in the next frame it will probably be
better, the next frame even more so, etc. This means that
stopping early will not ruin everything, although the
resulting animation might appear somewhat sloppier. 11.3.3
Cloth Simulation The fact that a stick constraint can be
thought of as a really hard spring should underline its
usefulness for cloth simulation. Assume, for example, that
a hexagonal mesh of triangles describing the cloth has been
constructed. For each vertex a particle is created, and for
each edge a stick constraint between the two corresponding
particles is initialized (with the constraint’s “rest
length” simply being the initial distance between the two
vertices). To solve for these constraints, we use
relaxation as described above. The relaxation loop could be
iterated several times. However, to obtain nice-looking
animations for most pieces of cloth, only one iteration is
necessary! This means that the time usage in the cloth
simulation depends mostly on the N square root operations
and the N divisions performed (where N denotes the number
of edges in the cloth mesh). As we shall see, a clever
trick makes it possible to reduce this to just N divisions
per frame update—this is really fast, and some might argue
that it probably can’t get much faster. Optimizing away the
square root. We now discuss how to get rid of the square
root operation. If the constraints are all satisfied (which
they should be, at least almost), we already know what the
result of the square root operation in a particular
constraint expression ought to be, namely, the rest length
r of the corresponding stick. We can use this fact to
approximate the square root function. Mathematically, what
we do is approximate the square root function by its
first-order Taylor expansion at a neighborhood of the
squared rest length r 2 (this is equivalent to one
Newton-Raphson iteration with initial guess r). A
real-valued function f may be approximated around a
neighborhood a by using its Taylor series: f(x) = f(a) + f
′ (a) 1! (x − a) + f ′′ (a) 2! (x− a) 2 + . . . . In the
case of the square root function f(x) = √ x around a = r 2
, we get the following: √ x ≈ f(r 2 ) + f ′ (r 2 )(x − r 2
) = √ r 2 + 1 2 √ r 2 (x− r 2 ) = r 2 + x 2r .

As expected, for x = r 2 , we get √ x ≈ r. Using the above


approximation to

rewrite Equations (11.1)–(11.4), we end up with the


following pseudocode:

// Pseudo−code for satisfying a stick constraint

// using sqrt approximation

d = x2 − x1; // OBS: vector operation; d, x1 and x2 are


vectors

d ∗= r ∗ r / (dotprod(d, d) + r ∗ r) − 0.5;

x1 += d;

x2 −= d;

Notice that if the distance is already correct (that is, if


||x 2 − x 1 || = r), then we

get d = (0, 0, 0), and no change is going to happen.

Per constraint we now use zero square roots, one division


only, and the squared

value r 2 can even be precalculated! The usage of


time-consuming operations is

now down to N divisions per frame (and the corresponding


memory accesses)—it

can’t be done much faster than that, and the result even
looks quite nice. The con

straints are not guaranteed to be satisfied after one


iteration only, but because of

the Verlet integration scheme, the system will quickly


converge to the correct state

over some frames. In fact, using only one iteration and


approximating the square

root removes the stiffness that appears otherwise when the


sticks are perfectly

stiff.

By placing support sticks between strategically chosen


couples of vertices

sharing a neighbor, the cloth algorithm can be extended to


simulate bending ob

jects, such as plants. Again, in Hitman, only one pass


through the relaxation loop

was enough (in fact, the low number gave the plants exactly
the right amount of

bending behavior).

The code and the equations covered in this section assume


that all particles

have identical mass. Of course, it is possible to model


particles with different

masses; the equations only get a little more complex. To


satisfy constraints while

respecting particle masses, use the following code:

// Pseudo−code to satisfy a stick \index{stick


constraint}constraint with particle masses

d = x2 − x1;

dlen = sqrt(dotprod(d,d));

f = (dlen − r) / (dl ∗ (invmass1 + invmass2));

x1 += invmass1 ∗ d ∗ f;

x2 −= invmass2 ∗ d ∗ f; Here, invmass1 and invmass2 are the


numerical inverses of the two masses. If we want a particle
to be immovable, simply set invmass= 0 for that particle
(corresponding to an infinite mass). Of course, in the
above case, the square root can also be approximated for a
speed-up. 11.4 Rigid Bodies The equations governing motion
of rigid bodies were discovered long before the invention
of modern computers. To be able to say anything useful at
that time, mathematicians needed the ability to manipulate
expressions symbolically. In the theory of rigid bodies,
this led to useful notions and tools such as inertia
tensors, angular momentum, torque, quaternions for
representing orientations, etc. However, with the current
ability to process huge amounts of data numerically, it has
become feasible and in some cases even advantageous to
break down calculations to simpler elements when running a
simulation. In the case of three-dimensional rigid bodies,
this could mean modeling a rigid body by four particles and
six constraints (giving the correct amount of degrees of
freedom, 4 × 3 − 6 = 6). This simplifies many things.
Consider a tetrahedron and place a particle at each of its
four vertices. In addition, for each of the tetrahedron’s
six edges, create a distance constraint like the stick
constraint discussed in the previous section. This
configuration suffices to simulate a rigid body. The
tetrahedron can be let loose inside the cube world from
earlier, and the Verlet integrator will then move it
correctly. The function SatisfyConstraints() should take
care of two things: (1) that particles are kept inside the
cube (like previously) and (2) that the six distance
constraints are satisfied. Again, this can be done using
the relaxation approach; three or four iterations should be
enough with optional square root approximation. Inside the
cube world, collisions are handled simply by moving
offending particles (those placed at the tetrahedron
vertices) such that they do not intersect with obstacles.
In a more complex setting than the cube world, however, the
sides of the tetrahedron may also intersect with obstacles
without the particles at the vertices themselves being in
invalid positions (see Figure 11.2). In this case, the
vertex particles of the tetrahedron, which describe the
position of the rigid body, must be moved proportionally to
how near they are to the actual point of collision. If, for
example, a collision occurs exactly halfway between
particles x 1 and x 2 , then both these particles should
both be moved by the same amount along the collision
surface normal until the collision point (which is halfway
between the two particles) has been moved out of the
obstacle (see Figures 11.3 and 11.4). Figure 11.2.
Tetrahedron (triangle) intersecting the world geometry.
p=x1 x2 q x2 q x1 p Figure 11.3. Stick intersecting the
world geometry in two different ways.

In an analogous way, collisions that take place on a face


of the tetrahedrons

or even inside the tetrahedron will require moving three or


all four particles to fix

the penetration. Let p be the penetration point on the


tetrahedron and q be the one

on the obstacle. To handle any type of collision, follow


the procedure described

below.

First, express p as a linear combination of the four


particles that make up the

tetrahedron: p = c 1 x 1 + c 2 x 2 + c 3 x 3 + c 4 x 4 such
that the weights sum to one:

c 1 + c 2 + c 3 + c 4 = 1 (this calls for solving a small


system of linear equations).

After finding d = q− p, compute the value λ = 1 c 2 1 + c 2


2 + c 2 3 + c 2 4

(λ is a so-called Lagrange multiplier). The new particle


positions are then given x2 p=q=x1 x2 x1 p=q Figure 11.4.
Resolved stick collisions. by x ∗ 1 = x 1 + c 1 λd, x ∗ 2 =
x 2 + c 2 λd, x ∗ 3 = x 3 + c 3 λd, x ∗ 4 = x 4 + c 4 λd.
The new position of the tetrahedron’s penetration point p ∗
= c 1 x ∗ 1 + c 2 x ∗ 2 + c 3 x ∗ 3 + c 4 x ∗ 4 will
coincide with q. For details on the derivation of the above
equations, see [Jakobsen 01]. The above equations can also
be used to embed the tetrahedron inside another shape,
which is then used for collision purposes. In this case, p
will be a point on the surface of this shape (See Figure
11.5). Figure 11.5. Tetrahedron (triangle) embedded in
arbitrary object geometry touching the world geometry. In
the above case, the rigid body collided with an immovable
world, but the method generalizes to handle collisions of
several (movable) rigid bodies. The collisions are
processed for one pair of bodies at a time. Instead of
moving only p, in this case, both p and q should be moved
towards one another.

In the relaxation loop, just like earlier, after adjusting


the particle positions
such that nonpenetration constraints are satisfied, the six
distance constraints that

make up the rigid body should be taken care of (since they


may have been inval

idated by the process), and the whole procedure is then


iterated. Three to four

relaxation iterations are usually enough. The bodies will


not behave as if they

were completely rigid since the relaxation iterations are


stopped prematurely, but

this is mostly a nice feature, actually, as there is no


such thing as perfectly rigid

bodies—especially not human bodies. It also makes the


system more stable.

By rearranging the positions and masses of the particles


that make up the

tetrahedron, the physical properties can be changed


accordingly (mathematically,

the inertia tensor changes as the positions and masses of


the particles are altered).

11.5 Articulated Bodies

It is possible to connect multiple rigid bodies by hinges,


pin joints, and so on.

Simply let two rigid bodies share a particle, and they will
be connected by a pin

joint. Share two particles, and they are connected by a


hinge (see Figure 11.6).

It is also possible to connect two rigid bodies by a stick


constraint or any

other kind of constraint—in order to do so, one simply adds


the corresponding

constraint-handling code to the relaxation loop.


This approach makes it possible to construct a complete
model of an articu

lated human body. For additional realism, various angular


constraints will have

to be implemented as well. There are different ways to


accomplish this. A sim

ple way is to use stick constraints that are enforced only


if the distance between

two particles falls below some threshold (mathematically,


we have a unilateral

[inequality] distance constraint, ||x 2 − x 1 || > 100). As


a direct result, the two

particles will never come too close to each other (see


Figure 11.7).

Particles can also be restricted to move, for example, in


certain planes only.

Once again, particles with positions not satisfying the


above-mentioned constraints

should be moved—deciding exactly how is slightly more


complicated than with

the stick constraints. Figure 11.6. Pin joint and hinge


joint using particles and sticks. x1 x2 x0 Figure 11.7. Two
stick constraints and an inequality constraint (dotted)
modeling, e.g., an arm. Actually, in Hitman, corpses aren’t
composed of rigid bodies modeled by tetrahedrons. They are
simpler yet, as they consist of particles connected by
stick constraints, in effect forming stick figures (see
Figure 11.8). The position and orientation of each limb (a
vector and a matrix) are then derived for rendering
purposes from the particle positions using various
cross-products and vector normalizations (making certain
that knees and elbows bend naturally). Figure 11.8. Ragdoll
model using particles and sticks (used in Hitman: Codename
47).

In other words, seen isolated, each limb is not a rigid


body with the usual six

degrees of freedom. This means that the physics of rotation


around the length axis
of a limb is not simulated. Instead, the skeletal animation
system used to set up

the polygonal mesh of the character is forced to orient the


leg, for instance, such

that the knee appears to bend naturally. Since rotation of


legs and arms around

the length axis does not comprise the essential motion of a


falling human body,

this works out okay and actually optimizes speed by a great


deal.

Angular constraints are implemented to enforce limitations


of the human

anatomy. Simple self-collision is taken care of by


strategically introducing in

equality distance constraints as discussed above, for


example, between the two

knees—making sure that the legs never cross.

For collision with the environment, which consists of


triangles, each stick is

modeled as a capped cylinder. Somewhere in the collision


system, a subroutine

handles collisions between capped cylinders and triangles.


When a collision is

found, the penetration depth and points are extracted, and


the collision is then

handled for the offending stick in question exactly as


described earlier. Naturally,

a lot of additional tweaking was necessary to get the


result just right.

11.6 Miscellaneous

11.6.1 Motion Control

To influence the motion of a simulated object, we simply


move the particles cor

respondingly. If a person is hit in the shoulder, move the


shoulder particle back

wards over a distance proportional to the strength of the


blow. The Verlet integra

tor will then automatically set the shoulder in motion.

This also makes it easy for the simulation to “inherit”


velocities from an un

derlying traditional animation system. Simply record the


positions of the particles

for two frames and then give them to the Verlet integrator,
which then automati

cally continues the motion. Bombs can be implemented by


pushing each particle

in the system away from the explosion over a distance


inversely proportional to

the squared distance between the particle and the bomb


center.

It is possible to constrain a specific limb, say the hand,


to a fixed position in

space. In this way, we can implement inverse kinematics


(IK): inside the relax

ation loop, keep setting the position of a specific


particle (or several particles) to

the position(s) wanted. Giving the particle infinite mass


(invmass=0) helps make

it immovable to the physics system. In Hitman, this


strategy is used when drag

ging corpses; the hand (or neck or foot) of the corpse is


constrained to follow the

hand of the player. 11.6.2 Friction Friction has not been


taken care of yet. This means that unless we do something
more, particles will slide along the floor as if it were
made of ice. According to the Coulomb friction model,
friction force depends on the size of the normal force
between the objects in contact. To implement this, we
measure the penetration depth d p when a penetration has
occurred (before projecting the penetration point out of
the obstacle). After projecting the particle onto the
surface, the tangential velocity v t is then reduced by an
amount proportional to d p (the proportion factor being the
friction constant). This is done by appropriately modifying
x(t − Δt) (see Figure 11.9). Care should be taken that the
tangential velocity does not reverse its direction—in this
case, it should simply be set to zero since this indicates
that the penetration point has ceased to move tangentially.
v t d v p t Figure 11.9. Collision handling with friction.
11.6.3 Collision Response To prevent objects that are
moving really fast from passing through other obstacles
(because of too-large time steps), a simple test is
performed. Imagine the line (or a capped cylinder of proper
radius) beginning at the position of the object’s midpoint
last frame and ending at the position of the object’s
midpoint at the current frame. If this line hits anything,
then the object position is set to the point of collision.
Though this can theoretically give problems, in practice it
works fine. Another collision “cheat” was used for dead
bodies. If the unusual thing happens that a fast-moving
limb ends up being placed with the ends of the capped
cylinder on each side of a wall, the cylinder is projected
to the side of the wall where the cylinder is connected to
the torso. 11.6.4 Relaxation The number of relaxation
iterations used in Hitman varies between one and ten with
the kind of object simulated. Although this is not enough
to accurately solve

the global system of constraints, it is sufficient to make


motion seem natural. The

nice thing about this scheme is that inaccuracies do not


accumulate or persist visu

ally in the system causing object drift or the like—in some


sense, the combination

of projection and the Verlet scheme manages to distribute


complex calculations

over several frames. Fortunately, the inaccuracies are


smallest or even nonexistent

when there is little motion and greatest when there is


heavy motion—this is nice
since fast or complex motion somewhat masks small
inaccuracies for the human

eye.

A kind of soft body can also be implemented by using “soft”


constraints, i.e.,

constraints that are allowed to have only a certain


percentage of the deviation “re

paired” each frame (i.e., if the rest length of a stick


between two particles is 100

but the actual distance is 60, the relaxation code could


first set the distance to

80 instead of 100, next frame to 90, then 95, 97.5, etc.).


Varying this relaxation

coefficient may in fact be necessary in certain situations


to enable convergence.

Similarly, over-relaxation (using a coefficient larger than


one) may also success

fully speed up convergence, but take care not to overdo


this, especially if the

number of iterations is low, as it may cause instabilities.

Singularities (divisions by zero usually brought about by


coinciding particles)

can be handled by slightly dislocating particles at random.

11.6.5 Extending the Verlet Approach

There are several ways to extend the Verlet approach to


allow for more advanced

representations and features. For one thing, it is possible


to represent rigid bodies

by quaternions and use inertial tensors to better model the


properties of objects.

The main idea of the Verlet integration of using the


previous positions instead of
velocities carries over, only the equations get a bit more
complex.

Constraints that are more general than the stick constraint


may be imple

mented by computing appropriate constraint Jacobians,


finding Lagrange mul

tipliers, etc.

Instead of using relaxation to solve for constraints, it is


possible to use more

precise algorithms for solving systems of equations, such


as conjugate gradient

methods or Newton methods, but this is outside the scope of


this chapter.

11.7 Conclusion

This pearl has described how a physics system was


implemented in Hitman: Co

dename 47 running on a low-spec platform. The underlying


philosophy of com

12.2.2 Constraint Solver

The threads of the cloth are modeled as distance


constraints. An individual con

straint strives to maintain a constant distance between two


particles. A particle

typically has more than one constraint attached to it. This


network of constraints

is solved with a relaxation solver.

A relaxation solver simply solves each individual


constraint independently

of the other constraints in the system. Solving one


constraint will potentially

violate the other connected constraints. However, each time


we iterate over all
the constraints in the system, the overall global error is
reduced. Given enough

time, the system converges to a solution.

To solve an individual constraint, we directly update the


positions of the at

tached particles [Provot 95].

Vector3 pa = constraint.m particleA.currentPosition;

Vector3 pb = constraint.m particleB.currentPosition;

float targetDistance = constraint.m restingDistance;

Vector3 dp = pa−pb;

distance = dp.length();

float derr = (distance − targetDistance)/distance;

pa += dp∗0.5∗derr;

pb −= dp∗0.5∗derr;

Often, the rate of convergence for a relaxation solver can


be improved slightly

by using a technique called over-relaxation. With


over-relaxation, we simply

overshoot our target by a percentage of the existing error.


This technique can

cause unwelcome artifacts, so use with caution. In the


context of character cloth,

I have found that a value of 1.15 allows us to perform 10%


fewer iterations while

remaining artifact free. This makes some intuitive sense.


Since the cloth tends

to have more stretching along the longer noncyclical paths


during the course of

a simulation, over-shooting helps accelerate the global


shrinking in those direc
tions, i.e., hanging capes or shirts have their bottoms
pulled up quicker.

float relaxationFactor = 1.15;

pa += dp∗0.5∗derr∗relaxationFactor;

pb −= dp∗0.5∗derr∗relaxationFactor; 12.3 Modeling Real


Fabrics Unmodified, the simulation technique outlined so
far produces clothing that looks like a light rubbery silk.
Fashionistas typically turn up their noses at such attire,
while gamers dream of the comfort such clothing would
bestow upon the wearer. Gamers desire to play neither a
comfortable gamer nor a fashionista during their gaming
sessions. Therefore, this fabric is irrelevant, and we must
try to improve the visual appeal. The application of
internal damping helps make the cloth look like it is made
of a more natural material. This is done by projecting the
particle velocities on the distance constraints. For the
best effect, it can be applied every iteration. Vector3
paPrev = constraint.m particleA.previousPosition; Vector3
pbPrev = constraint.m particleB.previousPosition; float
dampingFactor = 0.3f; Vector3 va = pa − paPrev; Vector3 vb
= pb − pbPrev; Vector3 vab = va − vb; Vector3 v = vab.dot(
dp ); float damping = v∗dampingFactor; pa +=
dp∗0.5∗damping; pb −= dp∗0.5∗damping; There is a
performance cost here, but the improvement to the visual
quality of the material is significant. Real fabrics buckle
much more easily in comparison to their resistance to
stretching. Ideally, this would be modeled by using a very
high-resolution set of particles. Even then a stiff
buckling resistance will be present, although at a higher
frequency and less noticeable scale. An alternative is to
weaken the constraints’ resistance to compression up to a
certain limit. This also helps alleviate the jagged
bunching and jittering of cloth that can occur at character
joints. Visually we lose some creasing and folding, but the
motion looks more convincing. As an example, around the
shoulder joint of a character, we will most likely see
popping and jagged cloth mesh artifacts. To fix this
problem, we can tune the constraints in this area to not
respond to compression:

float derr = (distance − targetDistance)/distance;

derr = ( derr < 0 ) ? 0.0f : derr;

This technique of modeling cloth, and indeed most known


cloth simulators, tends
to smooth out smaller wrinkles [Bridson et al. 03]. The
wrinkles are the most

noticeable feature of cloth, since they form dark shadowed


valleys against peaks

that catch much of the light. We can add wrinkles back in


as a rendering effect by

using wrinkle maps driven by compression values.

Friction forces are needed to model the contact between


cloth and skin in a

believable manner. The most basic and performant friction


model is to modify

the effective velocity of a particle when it experiences a


collision. We do so

by moving the previous particle position towards the


current one by the velocity

scaled by the friction coefficient:

Vector3 v = p − pPrevious;

v = v − normal∗dot(v, normal);

pPrevious += v∗mu;

Friction between cloth and skin is a fairly complicated


interaction. We could

make the friction strength depend on the depth of the


collision. This is only a

rough approximation of the contact force, and given the


complicated nature of the

situation, we can choose to leave it out. Another choice is


when to apply friction.

Applying friction with every collision is an option, or


only applying it once, either

at the start or the end of the solver loop. It is best to


experiment to find the right

look for each simulation.


12.3.1 Character Cloth Constraint

Attaching simulated cloth to an animated character requires


a special type of con

straint. A character bone may rotate and translate very


large distances in a single

frame. Keeping the cloth on the correct side of a bone’s


collision geometry is a

challenge.

The simplest constraint that will keep a particle from


passing through collision

geometry is to skin it rigidly to the bone. This isn’t a


very interesting way of doing

things. We’ll call this the pinning constraint, or just


pinning. If we have pinning

that makes sure a given particle can never move more than
halfway through the

collision geometry, then, providing the geometry is convex,


the collision response

will push the particle out to the side it came from. This
can be done with a

unilateral distance constraint between the particle and an


anchor position. The anchor positions used are the skinned
positions for the particle on the character rig. These data
should easily be made available, since most game engines
will already have a skinning system in place. As a bonus,
bone weightings should be authored so the anchor points are
in natural locations. This is what would be used for the
cloth verts, if there was no simulation. It is useful to
have a hard, immovable constraint where it is not possible
to move the cloth particle. Essentially, we don’t simulate
this particle at all so it doesn’t belong with the list of
simulated particles, but it will exist as the member of a
constraint. A nice way to implement this is to move all
those hard-pinned particles to the end of the particle list
and then terminate any particle update loops early. During
constraint updates, we don’t want to update the position of
any hard-pinned vert. We can vary the pinning strength. The
pinning strength is a value we use to apply only a portion
of the constraint correction. With a value of 1.0, we would
move the particle all the way back to its anchor position.
Applying a pinning strength that is proportional to the
distance from the skinned position helps make it less
apparent that there is a hard distance constraint being
applied. Such a distance-proportional pinning strength can
be applied before and after a set pinning radius. This
gives a good deal of control. The pinning function now
appears as in Figure 12.1. As long as the pinning strength
hits the maximum value of 1.0 before it moves over half the
radius of the collision geometry, we can be confident it is
doing its job. Since the pinning strength reduces the
constraint error by a proportional amount each iteration,
the effective strength is much more pronounced Figure 12.1.
Pinning function.

than a linear effect. So, if we want a subtle effect, we


need to use quite a small

value for the pinning strength.

It is important to have a flexible pinning function because


different sections

of a piece of clothing require different pinning values.


The bottom of a shirt can

move large distances, while the areas under an armpit need


tighter control. Arms

of a shirt are especially tricky to tune because we want


both dramatic simulation

and control. The radius for the collision geometry


representing arms is relatively

small. What works well in practice for maximum visual


effect is to have a pinning

strength of 0 and a radius of under half of the bone’s


collision radius. Then apply

a distance-proportional pinning strength after the radius


has been exceeded. This

softens the constraint, while providing good control. An


easy-to-use interface that

allows the character team to paint pinning values on a


cloth mesh is a very useful
thing.

12.3.2 Collisions

Spheres and capsules are easy-to-use collision geometries


and are a fair represen

tation of character limbs. To respond to collisions, we


simply push the position of

any interpenetrating particle via the shortest path to the


surface. This path points

along the vector formed from the position of the particle


and the center of the

collision object.

For a high-resolution cloth mesh, the torso of a


character’s body is too com

plicated to model with spheres and capsules. Unless we are


using a very large

number of capsules and spheres, the way the cloth rests on


the character will be

tray the underlying geometric approximations we used. A


triangle mesh can yield

good performance by utilizing a caching optimization. Each


particle should keep

track of which triangle it collided with in the last frame.


Check to see whether a

particle is within the edge boundaries of its cached


triangle (the triangle’s extruded

wedge). If so, collide with that triangle. If not, use an


edge-walking algorithm

to find the new triangle whose extruded wedge contains the


particle. Typically,

the particle will be in the bounds of its cached triangle


or have moved to a di

rectly neighboring triangle. Performance is actually quite


good. For best results,

the mesh should be closed and convex. Responding to the


collision is simply a

matter of pushing the particle’s position out to the


surface of the triangle along its

normal.

12.4 Performance

By far, the most expensive part of the simulation is the


collision detection. Since

the constraints directly and immediately update the


positions of the particles, we need to perform collision
checks, if not for every iteration of the relaxation solver
then for every two or three iterations. A final collision
pass should be done after all other constraints. This is
required to avoid having any geometry lying under the cloth
show through. Determining which collision objects need to
collide against which particles can be expensive. To
ameliorate this problem, we can group particles with
specific collision objects, e.g., the left sleeve only
needs to collide with the left arm. Modern consoles are
very sensitive to memory access and cache performance.
Avoiding the load-hit-store [Heineman 08] is important.
This can be done by ordering the list of constraints so
that those that update a common particle are spaced apart.
By spacing them apart, a particle’s write will hopefully be
completed before its data are needed by the next constraint
that uses it. At least we will reduce the time the next
constraint must wait. Figure 12.2. Cloth update: wrong
ordering.

12.5 Order of Cloth Update Stages

Ordering what parts of the simulation happen when is


critically important for

minimizing simulation artifacts. When coding, good


engineering practices say

that we should not have direct coupling between the


character animation system

and the cloth simulator. The skinning data represent a


significant amount of data
to hold on to for any length of time. We will want to use
it immediately after we

have calculated it. Since the skinning data are updated by


the character animation

system and we have efficiency in mind, the natural thinking


is to apply the pinning

constraint as the very first thing we do in our update.


This is the wrong order.

Looking at Figure 12.2 shows why. This figure shows a


particle anchored between

two collision spheres. The spheres translate a large


distance in the first frame.

This is not a configuration we would see in practice, but


it serves our instructional

purposes here. During the render phase (right after the


frame boundaries), we

are able to see the cloth particle on the wrong side of a


collision body. A better

[Heineman 08] Becky Heineman. “Sponsored Feature: Common


Performance Issues in Game Programming.” Gamasutra.
Available at http://www
.gamasutra.com/view/feature/3687/sponsored feature common
.php, 2008.

[Jakobsen 03] Thomas Jakobsen. “Advanced Character


Physics.”, 2003. Gamasutra. Available at
http://www.gamasutra.com/resource guide/20030121/ jacobson
01.shtml.

[Provot 95] Xavier Provot. “Deformation Constraints in a


Mass-Spring Model to Describe Rigid Cloth Behavior.” In
Graphics Interface ’95, pp. 147–155. Quebec: Graphics
Interface, 1995.
6 - VI - Skinning

the animation of a bone model (for an overview in this


field, see, e.g., [Jacka

et al. 07]). These techniques work very well in practice,


even for challenging

regions such as shoulders or heels. They are of a purely


kinematic nature and

there is no time dependence, so it does not matter if a


limb is moved slowly or

quickly—the calculated surface vertices are the same.

On the other hand, physics-based simulation has entered


computer games,

for example, in form of the simulation of ragdolls (a


collection of multiple rigid

bodies, where each of the bodies is linked to a bone of the


skeletal animation

system). A famous example is the game Hitman: Codename 47


by IO Interactive

[Jakobsen 01]. Such simulations can be used to model cloth,


plants, waving flags,

or dying characters.

More advanced physics simulations quickly become


computationally inten

sive and thus not suitable for real-time processing. This


is a pity because there are

a lot of physical effects that get completely lost even in


ragdoll physics—effects

that would be stunning if achieved in a real-time


simulation. It would be great

to realistically simulate the properties of solid


materials—watch how they react

and deform when applying pressure to the surface—or when


under the influence
of gravity. Or concerning character animation: animating a
character in its low

frequency motion using its bone model, defining some


material properties, and

letting the physics system take care of the small and


high-frequency motion—

think of the jiggling of fat tissue when an ogre starts to


move.

It is this tiny motion that adds most to the realism in a


simulation.

Of course, this animation system would have to take care of


maintaining sur

face details, such as the layout of veins on an arm or the


wrinkles on an old man’s

face.

In computer games, performance is very important, and only


a small percent

age of computation time can be spent on the physics


subsystem, but more and

more realistic simulations can enter our homes as


processors get faster and graph

ics hardware more programmable.

In this chapter, a physics simulation is developed that can


add secondary

deformation to a mesh, while the primary deformation can


still be driven by a

skeleton—the comfort of animating a character by some


simple bones will be

preserved.

One thing that we have to bear in mind is that for


simulation in computer

games, we are not ultimately striving for accuracy, as we


would in a scientific

simulation—but rather, we strive for believability—the


programmer is in the po

sition to trick the player into thinking that what the


player sees is real.

Such a simulation can dramatically improve the realism of


an animation and

still be economic in computational effort. In fact, the


techniques presented in this chapter take much less
computation time than the collision-handling routines that
are needed to have different models and geometry applying
forces on each other. The collision handling of deformable
bodies has to be more sophisticated than that of rigid
bodies, since there is always a certain penetration depth
when two deformable objects collide and deform each other,
and there is always the possibility of self-penetration
(see [Teschner et al. 05] for detailed information). For
simulating the effect of surface connectivity, a technique
called “shape matching” [Mu¨ller et al. 05] is used, which
takes care of maintaining surface details during the
simulation. Several approaches to addressing volumetric
effects of a solid material and its applicability are
discussed, and the best-fitting technique is used. If the
lack of realism of these techniques is not acceptable, the
method presented in Chapter 10 is a much better approach to
simulating deformable objects, since it is completely based
on a physically correct description. Throughout this
chapter, we are seeking for drop-in solutions that can
easily be integrated into an existing simulation. Objects
in current computer games are surface mesh, so our
deformable simulation should be surface based while still
being able to naturally simulate volumetric behavior. This
way, the deformation model can efficiently be integrated
into the rendering pipeline, and the computations can even
be done on the graphics hardware. Because of the
simplifications made, the simulation will rely on material
properties that need to be tuned by a designer during
content creation to become realistic. In this chapter, all
the background necessary to understand what is going on in
principle is covered, while always focusing on
practicability. We will work on an implementation of a
deformable mesh simulation that will gradually be extended
and can be modified to suit any special purpose in a game.
Section 14.2 will introduce the force model used for the
simulation and points out potential pitfalls. Section 14.3
incorporates the effect of surface connectivity in a
polygonal mesh in an economical way. The shape-matching
algorithm is described. The following section, Section
14.4, accounts for the influences of the volumetric effects
on a solid material without an accurate physical simulation
of the interior of the mesh. 14.2 The Interaction Model The
starting point of the simulation is a triangle mesh of
vertices with positions x 0 i . It can be animated in time
(e.g., using keyframes), but it does not have to,

for now. We want to enrich this static model with


physically motivated motion.

There are quite a lot of forces that can be taken into


account, so the focus should

lie on forces that add most to the felt realism of a


simulation. The question then

is how to construct them in a computationally economic way.

In the end, the sum of all acting forces is the change in


velocity for each vertex

at a given time step.

The first extension we make to this basic animated mesh


model is to call the

vertex positions of the animated mesh model the “rest”


positions (x 0 i ) and give

our actual positions (x i ) the freedom to vary from those.


They will get a mass to

define how they will react on a given force (remember f =


ma?). We also have

to keep track of the accumulated forces acting on each


vertex. The file structure

storing the per-vertex information for now could be


something like this:

struct Vertex {

Vector3 pos; // current position

Vector3 vel; // current velocity

Vector3 restPos; // position given by data


Vector3 force; // the total force on a vertex

real mass;

The Vector3 data type is a structure holding the three


components of a vector;

“real” can be either a singleor double-precision


floating-point representation.

14.2.1 Numerical Integration

Time is a continuous quantity. When writing down equations


for the positions

and the velocities of the vertices, they should hold for


every time t. In computer

simulation, however, we always have to deal with discrete


time steps of length h.

The introductory chapter of this book (Chapter 1) gives an


overview of the most

important integration schemes. We update the velocities and


positions by the

following scheme: v i (t + h) = v i (t) + hf total i (t), x


i (t + h) = x i (t) + hv i (t + h).

This is the semi-implicit Euler scheme. In contrast to the


standard Euler inte

gration, this scheme uses v(t + h) (implicit) in the


equation for x(t + h) while

the standard Euler integration uses v(t) (explicit). This


scheme still commits a

global error of the order of h every time step (see


[Donelly and Rogers 05] on

this matter). If more accuracy is needed, we could consider


using higher-order

integration schemes, such as the fourth-order Runge-Kutta


method [Acton 70]. Numerical solutions of differential
equations may be unstable because the problem being solved
is unstable or because the numerical method fails. Care has
to be taken to construct forces that prevent these
instabilities. Additionally, damping can help to get an
integration scheme stable. The next logical step now is to
model realistic forces. We will start with a simple force
that forms the basis for all other forces we will discuss.
14.2.2 The Spring Force To get a force that pulls a vertex
towards a desired goal position (like its rest position),
think of a spring that links the vertex to its goal
position. Each spring has a certain constant, which gives
us the force driving it to its goal position. When the
spring force is too strong compared to the time step, the
system will overshoot, which means that it will be driven
to the other side of the spring, and even more far away
than it was before. This way, the vertex will never reach
its goal position, but it will steadily increase its
energy. The system “explodes.” The maximum force that will
drive an object towards a rest position without
overshooting is given by f rest i = x 0 i − x i h 2 . That
this force does not overshoot can be seen by starting off
at some time 0 and calculating the succeeding positions and
velocities for the next two time steps. The system will
“convert” the displacement from the rest position (which
means potential energy) into speed (which means kinetic
energy), and the speed back into displacement, but the
displacement will not get bigger over time, so the total
energy will not rise over time. This force can be scaled by
a factor smaller than 1 to make the force smaller—this is a
first example of a material property that can be tuned by
the designer on a per-vertex basis. When we calculate the
force as presented above, it is absolutely necessary that
the time step h be fixed to a certain value throughout the
simulation and the system be integrated in constant
intervals. The physics integration should be run on a
dedicated thread, where it updates the positions and
velocities at a constant rate, like 30 frames per second
(FPS). With the knowledge of this force, a simple form of
secondary motion can be constructed. With the calculated
force, we need to update our velocities and the actual
positions (that are drawn on the screen) according to the
presented integration scheme.

Here is the pseudocode notation:

for each vertex v {

v.force = (v.restPos − v.pos) / (timeStep ∗ timeStep);

v.vel += v.force ∗ timeStep;


v.pos += v.vel ∗ timeStep;

With such a simulation, we would see the vertex positions


oscillate about the

rest position on and on, for infinity (apart from numerical


errors that are intro

duced in every time step). The contribution of this force


can be reduced as more

realistic forces are added to the system, but it should


still be integrated into the

simulation since it helps to keep the system controllable,


as there is always a trend

to the completely undeformed shape.

14.2.3 Safety Belts

This discussion would be completely out of place in a


scientific simulation, but

here we are speaking of computer games—we have to deal with


a lot of user

interaction, collisions, and rapid change of motion. Safety


comes first.

Although the algorithms introduced in this chapter provide


excellent robust

ness that should be suitable for computer games, it can


always happen that be

cause of some unforeseeable reason, suddenly the system is


totally pushed away

from its rest position or gets a boost in velocity that


will blow up the whole sys

tem.

We deal with this in the most straightforward way: we just


have to follow the
simple principle of “If X hurts, don’t do X .”

So whenever a vertex is too far away from its rest


position, we just have to

make sure that it isn’t.

Define a radius in which the vertex is allowed to be, and


whenever it leaves

this sphere, put it back on the surface of the sphere.

The same should be done for the velocities. This is called


position and velocity

clamping—a quick way to get rid of all possible accidents


that can happen to the

simulation.

14.2.4 Global Damping

This falls into the same category as position and velocity


clamping but it can be

motivated on a physical basis. We always want our objects


to come to rest at some

point in time, so we make sure they do. Damping can always


be used to enforce stability on spring systems, even if the
forces are not constructed to be stable with the used
integration scheme [Bhasin and Liu 06]. Every system loses
energy over time. In a physical sense, the energy is not
lost but goes into motion that is not visible to
perception, such as heating the materials or the
surrounding system. Here, a simple damping model is used
that will cause the system to come to rest by just scaling
the velocity by a certain factor at every time step.
scaleVector(v.vel, v.factorDamping); Damping forces can be
constructed to drain energy from the system in a more
sophisticated way so global damping can be reduced. But in
the end, a form of global damping should still be
implemented. 14.3 Neighborhood Interaction For the
following forces, the neighborhood (nbr(i)) of vertex x i
needs to be defined. The neighborhood can be quite a
general set of vertices; we just need an applicable
definition of it. If we do not have any connectivity
information, we can define it to be every vertex that is
within a certain radius of another. For vertices that form
a lattice, the neighborhood can be the nearest-neighbor
lattice sites. A triangle mesh has connectivity information
supplied by definition, for example, in the form of a
stream of vertices and a stream of triangles that group
three vertices into one surface fragment and store
additional information that is needed on a per-triangle
level. (See Figure 14.2.) Here, we define the vertex’s
ring-0 neighbors as its neighborhood. (This equals the
vertices that are grouped into one triangle with the
vertex!) We also define each vertex as a neighbor of
itself, which makes the formalism later simpler. Figure
14.2. A vertex x i and its local neighborhood.

The representation of the mesh as triangles is optimal for


the graphics hard

ware and the rendering process, but it is unsuited for our


algorithms because the

neighborhood of a vertex cannot be determined efficiently.


If the overhead can be

afforded, neighbor lists for all vertices can be created at


the beginning:

for each triangle t {

for each pair of vertices v i, v j in t { v


i.neighborsAdd(v j); v j.neighborsAdd(v i);

This provides very fast access to the neighborhood of a


vertex, but on the

downside, it takes a lot of extra memory that can become


unacceptable. Since

neighborhood access is unlikely to become the bottleneck


here, it is advised that

we trade some of the access speed for memory—there are way


more efficient data

structures for this purpose [Campagna et al. 98]. Here, the


DirectedEdge

data structure is used:


struct DirectedEdge {

int vertex; int neighbor; int next; int prev;

};

This data structure represents every triangle as three


directed edges (see Fig

ure 14.3), where each edge has a reference to the vertex it


is directed to, as well as

give each vertex the reference to just one of the edges


that head away from itself,

we can restore its whole neighborhood just with the prev


and the neighbor

Figure 14.3. A directed edge (dashed) and its previous, its


next, and its neighbor edges. To

retrieve the whole neighborhood information, we just need


to have a pointer to the previous

Figure 14.4. Initial (x 0 i on the left) and deformed (x i


on the right) positions of a vertex i

and its neighbors.

14.3.2 Maintaining Surface Details and Shape Matching

Simulating the effect of surface connectivity based on a


physical model is com

plex. Using the physically correct material laws would not


allow for real-time

simulation without cutting the geometrical complexity by


too much. Fortunately,

we are in a lucky position since our simulation does not


have to be realistic, it just

has to look realistic. And even most physical models are


just approximations of

what is really going on. That is the way it works. There


are also no rigid bodies

in nature, but there are some bodies that look and behave
as if they were rigid.

We will use a technique called shape matching [Mu¨ller et


al. 05] that approx

imates the influence of the neighboring surface vertices


for every vertex surpris

ingly well.

The technique is absolutely nonphysical, but the result


looks very realistic,

plus it has some important physical properties: it


preserves the center of mass and

the angular momentum of the matched vertices. This way, it


will not introduce

any net torque to the system. The basic idea is this: for
each vertex, we calculate

the least-squares rigid body transformation of its


neighbors rest positions and use

them as new goal positions. For those not familiar with the
topic, this should be

explained in a little more detail.

When the mesh gets deformed, the vertex positions are no


longer equal to the

rest positions of the mesh (see Figure 14.4).

Since the vertices are connected, they should be driven


back into their rigid

shape by the influence of their nearest neighbors (see


Figure 14.5). The rigid

shape of the neighborhood does not have to be defined by


the rest positions x 0 i

because it is possible to translate and rotate the vertex


cloud in whole, without

changing the relative shape of it.

Think of a mesh where each vertex has been moved by the


same translation—

we could just move the rest position by the same


translation as the vertices and

there will be no forces acting. What if the vertices have


been displaced by difFigure 14.5. A vertex that has been
displaced relative to its neighborhood should feel a
back-driving force that maintains surface details. ferent
distances? The best translation of the rest positions is
the one that matches the centers of mass of the initial
(rest) shape and the deformed shape (Figure 14.6 (left)).
This results in the following goal positions: c i = x 0 i −
(x 0 cm − x cm ). For this quantity, we need to calculate
the centers of mass for the original and the deformed
shapes: for each vertex v in v i.neighbors { cm += v.pos;
cm 0 += v.goalPos; masses += v.mass; } cm /= masses; cm 0
/= masses; This is still not the optimal solution because
the rotational degree of freedom has not yet been used. It
is introduced in the form of the matrix R, which represents
the optimal rotation of the point cloud around the matched
centers of mass (Figure 14.6 (right)). The optimal rigid
transformation c i = R(x 0 i − x 0 cm ) + x cm has the
property that it minimizes the quantity ∑ i m i (c i − x i
) 2 . This matches the goal positions and the actual
coordinates in the “least-squares sense.” Additionally, it
takes care of the fact that heavy particles are harder to
(a) (b)

Figure 14.6. The original shape of a vertex i and its


neighbors is matched to the deformed

shape (x i ) by an optimal rigid transformation. (a) This


results in a goal position c i for

vertex i. (b) Then the vertex x i is pulled towards the


goal position c i .

move than lighter particles—a displacement for a heavy


particle should cost more

than that of a light particle.

Since the calculation of this rotation is not directly


obvious, the derivation of

it is put into the appendix for the interested reader.

Now the spring force can be used again to construct a force


that pulls the
vertex towards the goal positions c i : f detail i = c i −
x i h 2 .

Since the least-squares goal positions were calculated, it


should be remarked

that these can also be used to build a rigid-body


simulator. The goal positions

are, of course, the positions of the rigid shape. If we let


the actual positions of the

vertices directly snap to the goal positions after each


integration step, the behavior

of a rigid body is mimicked.

The goal positions can also be used to introduce another


form of damping: one

process [Mu¨ller et al. 08] uses the least-squares


algorithm to fit an instantaneous

rigid motion to the particles. Then at every time step,


nonrigid motion is bled off

until only the rigid body motion remains.

14.3.3 Deformable Surface Mesh

We have accumulated several forces that can act on every


vertex in the mesh. The

relative strength of the forces must be defined per


material or per vertex. They

can be tuned by the designer via an editing interface in


the model editor so they

end up with a realistic simulation of the material in


question. Another method

would be to acquire the parameters from example animations


that already exist struct Vertex { Vector3 pos; Vector3
vel; Vector3 goalPos; Vector3 force; // force coefficients
real factorRest; real factorDetail; real factorNeighbors;
real factorDamping; // one of the edges to retrieve
neighborhood DirectedEdge edge; real mass; }; Listing 14.1.
The complete vertex class. for the model in question. A
suitable parameter-fitting algorithm is presented in [Shi
et al. 08]. We have to keep in mind that there are limits
within which the parameters have to be set for a stable
simulation. A complete vertex structure that accumulates
the per-vertex information about everything discussed until
now could look like Listing 14.1. We accumulate the forces
discussed so far with f total i = α rest i · f rest i + α
neigh i · f neigh i + α detail i · f detail i and check
whether the calculations work as intended. A block of 16
vertices connected in a simple geometry is defined to test
the implementation (see Figure 14.7). The red spheres are
the rest positions of the mesh while the white spheres are
the goal positions of the shape-matching algorithm. The
actual positions are the yellow spheres. First, we can
displace every vertex just a bit and watch it go back into
its original shape. If we apply a driving force to our
block on the right, even the vertices on the other end of
the body start to wiggle about. In the demo, we can also
switch on a gravitational force (and set factorRest to
zero) and watch the body hit the ground. The body stays in
shape just by means of the shape-matching algorithm (see
Figure 14.8). A demo is included with the supplemental
materials.

Figure 14.7. Driven deformation of a simple mesh geometry


(see Color Plate XIV).

Figure 14.8. Under the influence of gravity, the geometry


stays in shape just by means of

shape matching of the local neighborhoods (see Color Plate


XV).

14.4 Volumetric Effects

While the shape matching of the surface vertices has a huge


impact on realism,

it still is not a complete solution for our problem, since


what we are missing

completely is the influence of the interior of the body on


the surface of it.

A more practical problem with this is that if only shape


matching is used, the

surface mesh will not follow the bone motion very well, and
too much contribu
tion of f rest is needed, which renders the simulation
unrealistic. The model we are

dealing with is a surface mesh. In a realistic material


simulation, the surface ver

tices should not only experience forces from its neighbors


but also forces acting

on the surface from the inside. Here we run into a problem.


We do not have any

information about the inside of the mesh. Meshless shape


matching [Mu¨ller et al. 05] discards neighborhood
information in whole and performs shape matching on the
whole point cloud. This way, each vertex feels the
influence of every other vertex, as would a realistic soft
material. The problem with this is that the larger the
shape-matching clusters are, the faster deformations are
smoothed out, and the shape will return to the rigid shape
much sooner. If the algorithm is unaltered, the range of
motion is cut drastically. Within the limit of all vertices
in one cluster, it will always try to match all particles
to the undeformed mesh. Thus, it will only allow small
deviations from the rigid shape. This comes in handy for
simulating rigid-body dynamics with this algorithm, but
this is not in the focus of this chapter. 14.4.1 Extensions
to Meshless Shape Matching Mu¨ller [Mu¨ller et al. 05]
proposes some extensions to the meshless shapematching
algorithm to allow for bigger derivations from the rigid
shape. We should look at it for completeness, however it is
not that well-suited for character animation. The idea is
to allow the transformation that transforms x 0 i into c i
, c i = Rx 0 i + t, to be more general. Sheer and stretch
modes can be accounted for by mixing a bit of the
previously calculated linear transformation A into the
transformation βA+ (1− β)R. Here the mixing is controlled
by the additional parameter β. The transformation R still
ensures that there is a tendency towards the undeformed
shape. Volume conservation has to be taken care of by
ensuring that det(A) = 1, which is not automatically the
case. This can be extended to include quadratic
deformations. We will not use this approach because we will
still lose too much realism by discarding the neighborhood
information, especially the small, high-frequency modes we
want to achieve. Extending the range of motion for the
shape matching of the neighborhood clusters is not
necessary. 14.4.2 Lattice-Based Shape Matching Another way
of simulating volumetric effects is to turn to discrete
approximations of the inside of the mesh. The general idea
is to fill the inside with a lattice of evenly spaced
vertices, let them take care of the physics, and
reconstruct the deformed surface mesh from the deformed
lattice after. Unfortunately, these discrete

approximations can be very expensive to simulate. Simple


lattice deformers have

been around for a while—like ChainMail (see [Gibson and


Mittrich 97] again),

which, although providing speed and robustness suitable for


interactive process

ing, suffers from limited realism.

Here again, shape matching can come to our help. In [Rivers


and James 07],

an algorithm is presented to efficiently calculate the


shape matching of a cubic

lattice.

The idea is to voxelize the mesh and flood the inside of


the mesh with solid

objects in a cubic lattice. Steinemann [Steinemann et al.


08] uses an octree-based

hierarchical sampling instead of an evenly spaced lattice.


The original mesh is

then deformed using trilinear interpolation of the vertex


positions in the lattice.

Although this approach results (depending on the resolution


of the lattice) in inter

active rates, we will use a much more simple approach to


account for volumetric

effects that is more suited to character animation. It is


presented in the next para

graph.

14.4.3 A Link to the Bone

When there is a bone model that drives the mesh, another


simplified model can be

used that mimics the real situation quite well [Shi et al.
08]. We apply yet another

spring to our surface vertex for each bone and link it to


the bones they are assigned

to. But we do not fix the end at a certain position along


the bone, allowing it to

slide freely along the bone. This way, the force tries to
maintain the original

distance from the bone. Before constructing this force, a


bone model is defined

that assigns each vertex just one bone. This will be


extended to a model that

assigns more than one bone to a vertex for smooth skinning.


Here, the calculated

force will be the (weighted) sum of the contributions of


each bone.

A basic bone model. We start with a simple bone model to


discuss the basic

structure. The bone model will be built up from the joints,


where each joint has

a position, an orientation, and a parent. A joint without a


parent is called a root

joint.

struct Joint { Quat4 orient; Vector3 pos; int parent;};

The link between a joint and its parent is called the bone.
Each vertex is

assigned a joint, and its rest position x 0 i is calculated


by x 0 i = qˆx rel i qˆ −1 + j i , Figure 14.9.
Transforming a relative coordinate x rel i (left) to global
coordinates x 0 i (right) using the positions of its joint
j i and the joint’s parent j p . where qˆ is the rotation
quaternion and j i is the position of the joint.
Quaternions are explained in the introductory chapter
(Chapter 1), since they are quite useful for the
representation of rotations in computer graphics—especially
character animation. The vertex positions are relative to
its supporting joint and have to be transformed into global
space (see Figure 14.9). The vertex structure is extended
by a vector called relPos, which is the position of a
vertex relative to its supporting joint. This is the only
coordinate the designer has to supply; the restPos is
calculated from this coordinate using the above formula.
for each joint j { v = j.pos − j.parent.pos; normalize(v);
j−>orient = rotationQuaternion(u, v); } for each vertex v {
j = v.joint; tempPos = rotateVector(v.relPos, j.orient)
v.goalPos = tempPos + j.pos; } In the first loop, we
calculate the rotation quaternion of the joint as described
above. In the second loop, we use the calculated quaternion
to transform our relPos coordinates into global space
(restPos). These are again the positions that are used by
all the force calculations we discussed before. Calculating
the force. We need a force that maintains the distance to
the bone for each vertex. For this, we compare the actual
distance to the desired distance. We have to calculate the
distance from the bone to the actual positions x i and the
distance to the rest positions x 0 i . First, the unit
vector in the direction of the joint’s parent is obtained
by

Figure 14.10. The projection x proj of the vertex x i on


the bone is used to construct the

distance x ib from the bone for vertex x i . The distance


is compared to the distance of the

rest position x 0 i to construct a force f bone i that


maintains the distance to the bone.

axis = joint.pos − parentJoint.pos;

normalize(axis);

From this, the part of the vertex position that points in


the direction of the joint’s

parent can be calculated by taking the dot product, and the


projection x proj of x i

on the bone can be calculated by multiplying the unit


vector in the direction of the

bone with this quantity.

projection = dotProduct(axis, v.pos);


projVector = scaledVector(axis, projection);

The vector that points from the nearest point on the bone
to the vertex x i is

now just the difference between x i and x proj . We call it


x ib for the actual positions

and x 0 ib for the goal positions. This is shown


graphically in Figure 14.10.

With these two quantities, a force that pulls the vertex to


the desired distance

from the bone can be constructed: f bone i = ( |x 0 ib | |x


ib | − 1 ) x ib h 2 ,

where |x ib | is the length of x ib . Whenever x 0 ib is


longer than x ib , the force is

directed away from the bone (in the direction of x ib ),


and if x 0 ib is shorter than

x ib , the force is directed towards the bone, as is


needed. Figure 14.11. A tube, supported by three joints. In
our variable names, the calculation looks like this:
preFactor = (distanceVectorAbs0 / distanceVectorAbs − 1)
/(timeStep ∗ timeStep); scale(distanceVector, preFactor);
14.4.4 Skeleton-Driven Mesh We now have a detailed force
model consisting of several forces that can be added: f
total i = α rest i · f rest i + α neigh i · f neigh i + α
detail i · f detail i + α bone i · f bone i . We can apply
this model to a geometry with two bones connected by a
joint, with a cylindrical mesh around each bone (see Figure
14.11). The joints can be moved freely by selecting them
with the mouse and moving them around—this causes kinematic
deformation of the goal positions. The surface geometry
follows the positions of the joints while experiencing
secondary deformation. (See Figure 14.1 (middle and
right).) 14.4.5 Application to Smooth Skinning This basic
bone model works very badly, especially in joint regions
where each vertex should feel the influence of more than
one bone. This is addressed by smooth skinning (as opposed
to rigid skinning, used before) techniques such as
skeleton-subspace deformation (SSD), which has been around
in computer graphics for quite a while [Magnenat-Thalmann
et al. 88]. This is used, for example, in the MD5 model
format that comes from id Software’s Doom 3 first-person
shooter. Vertex positions are not given explicitly but must
be calculated by the contributions of multiple weights that
are assigned to joints. Here, the weights have relative
positions to the bones, not the vertices, so these weight
positions

get transformed according to their assigned bone. The


position of a vertex is a

weighted sum of these transformed weight positions. The


Internet provides a lot

of detailed documentation on this format.

Geometry produced from this specification works well as a


kinematic basis

for the secondary deformations presented here.

The vertices in the MD5 format are given implicitly as a


sum of weights:

struct ModelVertex { int start; int count; };

Here, start defines the first weight and count the number
of weights after the

starting weights that belong to this vertex:

struct ModelWeight { int joint; float bias; Vector3 pos; };

The weight contains the information of how to construct the


final vertex po

sitions; pos defines the position of the weight, and bias


states how much the

weight contributes to the vertex. Using the weight, we can


access the bone model

information, since joint assigns each weight a joint:

struct ModelJoint {

char name[64];

int parent;

Vector3 pos;

Quat4 orient;
};

This is basically the same definition of a joint that was


used before. List

ings 14.2 and 14.3 show the application of the


surface-detail preservation and the

bone-distance preservation forces to an actual MD5 model.


Since an MD5 model

can consist of several independent meshes, we have to


specify which one we want

to deform. The supplementary material includes an


application that demonstrates

the interactive deformation of the animated Stanford


armadillo model (see Fig

ure 14.12). Vector3 DeformableMD5::detailForce(int mesh,


int vertex) { int i; Vector3 q, p; Vertex ∗vertices =
meshes[mesh].finalVertices; int neighbors[MAX NEIGHBORS];
int numNeighbors = getNeighbors(mesh,vertex,neighbors); /∗
if there are less than 3 particles in the neighborhood, the
particle is isolated ∗/ if (numNeighbors < 3) return
vecCreate(0.0f,0.0f,0.0f); /∗ calculate centers of mass ∗/
Vector3 cm = vecZero(); Vector3 cm 0 = vecZero(); float
masses = 0; for (i = 0; i < numNeighbors; i++) { Vertex ∗ v
= &vertices[neighbors[i]]; vecAdd(cm, v−>pos); vecAdd(cm 0,
v−>restPos); masses += v−>mass; } vecScale(&cm,
1.0f/(masses)); vecScale(&cm 0, 1.0f/(masses)); /∗
calculate optimal rotation R ∗/ Matrix3x3 A pq = matZero();
for (i = 0; i < numNeighbors; i++) { Vertex ∗ v =
&vertices[neighbors[i]]; q = v−>restPos − cm 0; p = v−>pos
− cm; for (int j = 0; j < 3; j++) for (int k = 0; k< 3;
k++) { A pq[j][k] += v−>mass ∗ pk ∗ q[j]; } } Matrix3x3 R =
getRotationalPart(A pq); /∗ calculate the position that
preserves best laplacian coordinates ∗/ Vector3 diff =
vertices[vertex].restPos − cm 0; matMult(&R, &diff);
Vector3 force = diff + cm − vertices[vertex].pos;
vecScale(force, 1/(timeStep ∗ timeStep)); return force; }
Listing 14.2. The shape-matching algorithm for surface
detail preservation on a model definition using SSD.

Vector3 DeformableMD5::volumetricForce(int mesh, int


vertex) {

int i;
ModelMesh ∗m = &meshes[mesh];

ModelVertex ∗mv = &m−>vertices[vertex];

Vertex ∗v = &m−>finalVertices[vertex];

real totalWeight = 0.0f;

Vector3 totalForce = vecCreate(0.0f,0.0f,0.0f);

/∗ calculate the contribution of one joint ∗/

for (i = mv−>start; i < mv−>start + mv−>count; i++) {

ModelWeight ∗w = &m−>weights[i];

/∗ from weight, retrieve joint and its parent ∗/

ModelJoint ∗j = &skeleton[w−>joint];

ModelJoint ∗jp = &skeleton[j−>parent];

/∗ calculate the unit vector in the direction of the bone ∗/

Vector3 axis = j−>pos − &jp−>pos;

vecNormalize(axis);

/∗ calculate the force contribution as before ∗/

Vector3 diffPos = v−>pos − j−>pos;

Vector3 diffPos0 = v−>restPos − j −>pos;

real projection = vecDot(axis, &diffPos);

real projection0 = vecDot(axis, &diffPos0);

Vector3 projVector = vecScaledVector(axis, projection);

Vector3 projVector0 = vecScaledVector(axis, projection0);

Vector3 distanceVector = diffPos − projVector;

Vector3 distanceVector0 = diffPos0 − projVector0;

real distanceVectorAbs = vecLength(distanceVector);

real distanceVectorAbs0 = vecLength(distanceVector0);


if (distanceVectorAbs == 0) return vecCreate(0,0,0);

real preFactor = (distanceVectorAbs0 / distanceVectorAbs −


1.0f) /(timeStep ∗ timeStep);

Vector3 result = distanceVector;

vecScale(&result, preFactor ∗ w−>bias);

totalWeight += w−>bias;

vecAdd(&totalForce, &result);

/∗ sum over all contributions ∗/

vecScale(&totalForce, 1.0f/totalWeight);

return totalForce;

Listing 14.3. The bone-distance preservation algorithm for


a model definition using SSD. Figure 14.12. The Stanford
armadillo model, experiencing secondary deformation—the
vertices at the body region deform strongly, giving the
experience of fatty tissue. Figure 14.13. The surface
vertices can be subject to external forces at runtime,
resulting in interactive dynamic deformations. 14.5 Final
Remarks In this chapter, we have managed to bring
skeleton-driven animation beyond the purely kinematic
approach that is currently used in computer games by
developing a dynamic simulation that enriches the visual
experience of the animation. Although the simulation is
based on forces, it is not exactly physics based since the
forces are not modeled on physical laws. Of course, no
technique is suited for all applications—the techniques
used here are not suitable when an accurate modeling of the
physical situation is needed. This is the weak point of
this kind of simulation. But it turns out that the impact
on believability in games is immense.

For an accurate (based on the physical definition of the


strain tensor) simula

tion, Chapter 10 provides much better results.

Although the calculations could be applied to the mesh


during a preprocessing

stage to reduce computational effort, the technique is very


well-suited for real

time processing for the benefit of interactivity of the


animation. The skeleton

driven vertices can be subject to external forces of any


kind (see Figure 14.13).

Special collision-detection algorithms might be needed here


[Teschner et al. 05],

which is unfortunately a lot more computationally


intensive. This is beyond the

scope of this chapter.

Appendix: Calculating the Optimal Rotation

For the shape-matching algorithm, a rotation is needed that


best matches a given

set of points to another set of points (with an equal


number of points) by mini

mizing their distance-squares.

Since we already matched the centers of mass (so there is


no translation nec

essary for optimization anymore), we define the relative


locations by q i = x 0 i − x 0 cm p i = x i − x cm

We start off by searching for a linear transformation A


such that c i = Aq i +

x cm matches x i best, and then we try to extract the


rotation that A contains. The

quantity we have to minimize can now be written as ∑ i m i


(Aq i − p i ) 2 .

We should now focus on the contribution of one neighbor i


and omit the mass

for now. We can simplify our notation for the next few
calculations to (Aq− p) 2 = (Aq− p)(q T A T − p T ).
Now we write out the multiplications component-wise (take
care: u, v, w are

matrix and vector entry indices now, not particle indices):

u ( ∑ v A uv q v − p u )( ∑ k A uw q w − p u ) = ∑ u ( ∑ v
A uv q v ) 2 − 2p u ∑ v A uv q v + ∑ u p 2 u . Taking the
derivative ∂/∂A lm to the lm component of the matrix A
yields ∂... ∂A lm = 2A lm q m q m − 2p l q m . Writing this
again in matrix-vector notation, we get 2 ( (Aq)q T − pq T
) , and setting the derivative to zero brings us to Aqq T −
pq T = 0 →A = pq T · (qq T ) −1 . Doing this calculation
with the whole sum and the mass-weights would bring us to A
= ( ∑ i m i p i q T i )( ∑ i m i q i q T i ) −1 . This is
great because this is a quantity we can actually calculate.
The second part we can throw away because it is symmetric
and, thus, cannot contain a rotation. The rest of the
expression we call A pq . Just do the math: A pq =
zeroMatrix(); for each vertex v in v i.neighbors { q =
v.restPos − cm 0; p = v.pos − cm; for all entries j, k { A
pq[j][k] += v.mass ∗ p[k] ∗ q[j]; } } By so-called polar
decomposition, we are now able to decompose the matrix A pq
into a rotation R and a scaling S: A pq = RS. How the
scaling can be obtained can be understood intuitively: if
we apply A pq to a unit vector, the rotational partR will
rotate the vector on the unit sphere, but the scaling S
will displace it from the shell of the unit sphere. Now we
apply A T pq : the rotational part R T will rotate the
vector back to the original position,

while the scaling will displace the vector even more from
the shell. So the com

bined operation acts as if we had applied S twice: A T pq A


pq = (RS) T (RS) = S T R T RS = S T S = S 2 .

So, unfortunately, we have to take the square root of this


matrix equation to

obtain S. As this is a common problem in mathematics and


physics, this problem

has been addressed a lot and there are good numerical


methods to calculate this

quantity. The usual approach is to diagonalize the matrix S


2 : S 2 = Vdiag(λ)V T ,
where λ are the eigenvalues of the matrix S 2 .

A very good overview, as well as some state-of-the-art


algorithms for the di

agonalization of 3× 3 matrices, has been given by [Kopp


06]. Once the matrix is

diagonalized, we can take the square root of the diagonal


entries. S = Vdiag( √ λ)V T

to obtain the matrix S.

There is also what is called the Denman–Beavers square root


iteration—this

works without diagonalization. It is easy to implement and


very robust, although

not as efficient (see [Denman and Beavers 76]).

We will use the Jacobi algorithm here, which is the oldest


but is also a very

robust algorithm. It starts off with the identity matrix


for V and applies so-called

Jacobi sweeps on it (see [Kopp 06]).

Since the rotation matrices we are dealing with are “almost


diagonal” already,

it will take only one to two Jacobi sweeps on the average


for each vertex. Since

this operation is done very often, we should think about


caching the matrix V

from the previous time step instead of starting off with


the identity matrix at every

time step. This induces further memory usage but reduces


computation time.

Using S, we can now calculate the rotational part: R = A pq


S −1 .

Acknowledgments

I want to thank Ury Zhilinsky for his input and support


during my work on this

chapter. The MD5 model used was built upon polygonal data
from the Stanford

[Mu¨ller et al. 08] M. Mu¨ller, B. Heidelberger, M. Hennix,


and J. Ratcliff. “Hierarchical Position Based Dynamics.”
Presentation given at Virtual Reality Interactions and
Physical Simulations VIRPhys, Grenoble, November 13– 14,
2008.

[Rivers and James 07] A. R. Rivers and D. L. James.


“FastLSM: Fast Lattice Shape Matching for Robust Real-Time
Deformation.” ACM Transactions on Graphics (SIGGRAPH’07)
26:3 (2007), Article No. 82.

[Shi et al. 08] X. Shi, K. Zhou, Y. Tong, M. Desbrun, H.


Bao, and B. Guo. “Example-Based Dynamic Skinning in Real
Time.” ACM Transactions on Graphics (SIGGRAPH’08) 27:3
(2008), Article No. 29.

[Steinemann et al. 08] D. Steinemann, M. A. Otaduy, and M.


Gross. “Fast Adaptive Shape Matching Deformations.” In
Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium
on Computer Animation, pp. 87–94. Aire-la-Ville,
Switzerland: Eurographics Association, 2008.

[Teschner et al. 05] M. Teschner, S. Kimmerle, B.


Heidelberger, G. Zachmann, L. Raghupathi, A. Fuhrmann,
M.-P. Cani, F. Faure, N. Magnenat-Thalmann, W. Strasser,
and P. Volino. “Collision Detection for Deformable
Objects.” Computer Graphics Forum 24:1 (2005), 61–81.

You might also like