How Ubisoft Montreal Develops Games For Multicore - Before and After C++11 - Jeff Preshing - CppCon 2014
How Ubisoft Montreal Develops Games For Multicore - Before and After C++11 - Jeff Preshing - CppCon 2014
Jeff Preshing
Technical Architect
Ubisoft Montreal
Watch Dogs Assassin’s Creed Unity
MULTICORE
Pipelining Work
Dedicated Threads
Task Schedulers
CONCURRENT OBJECTS
Multiple threads operate concurrently on the object with at least one
thread modifying its state.
Pattern
Concurrent Object
Platform Atomic
primitives operations
Pipelining Work
PIPELINED GRAPHICS
Semaphore Semaphore
Graphics Graphics Graphics
PIPELINED GRAPHICS
how to avoid concurrent modifications?
Engine
game objects
Graphics
DOUBLE-BUFFERED GRAPHICS STATE
One approach
struct
Object
{
...
Matrix
xform[2];
...
};
SEPARATE GRAPHIC OBJECTS
Another approach
request
Loading
sleep sleep
Load chunk of world content
DEDICATED LOADING THREAD
Queue
Loading
WAKING UP THE LOADING THREAD
Loading Thread
ThreadSafeQueue<Request>
requests;
for
(;;)
Event
workAvailable;
{
workAvailable.waitAndReset();
while
(r
=
requests.tryPop())
{
Engine Thread
processLoadRequest(r);
requests.push(r);
}
workAvailable.signal();
}
Cancel requests
Custom Interrupt requests
Re-prioritize requests
Task Schedulers
FINE-GRAINED PARALLELISM
Motivation for a task scheduler
Worker
Worker
Worker
Worker
SIMPLE TASK QUEUE
Queue
WAKING UP THE WORKER THREADS
Worker Thread
ThreadSafeQueue<Task>
tasks;
for
(;;)
Event
workAvailable[numThreads];
{
workAvailable[thread].waitAndReset();
while
(t
=
tasks.tryPop())
Submitting Thread
{
tasks.push(t);
t-‐>Run();
for
(int
i
=
0;
i
<
numThreads;
i++)
}
workAvailable.signal();
}
Input
class
TaskGroup
Logic {
private:
Array<Item*>
m_Items;
Physics
...
};
Animation
tail 0
tails 1, 2, 3
head
Custom
Logic
Physics
Animation
Pattern
Concurrent Object
Atomic
operations
Game
Industry
EXAMPLE
Capped wait-free queue
m_readPos
m_readPos
Pattern
Concurrent Object
Atomic
operations
Thread 1 Thread 2
int X;
OK!
C++11 FORBIDS DATA RACES
If multiple threads access the same variable concurrently, and at least one
thread modifies it, all threads must use C++11 atomic operations.
Thread 1 Thread 2
int
X;
Ra c e! r
Datad Behavio
d efine
Un
C++11 FORBIDS DATA RACES
If multiple threads access the same variable concurrently, and at least one
thread modifies it, all threads must use C++11 atomic operations.
Thread 1 Thread 2
atomic<int> X;
OK!
Thread 1 Thread 2
X = 0x80004; c = X;
Thread 1 Thread 2
easier difficult
slower faster
SEQUENTIALLY CONSISTENT ATOMICS
Example #1
atomic<int>
A(0);
atomic<int>
B(0);
Thread 1 Thread 2
store, A
=
1;
B
=
1;
store,
then load c
=
B; d
=
A;
then load
Possible Interleavings: c
d
A
=
1;
A
=
1;
B
=
1;
0
0
Impossible!
c
=
B;
B
=
1;
d
=
A;
0
1
B
=
1;
c
=
B;
A
=
1;
1
0
d
=
A;
d
=
A;
c
=
B;
1
1
LOW-LEVEL ATOMICS
Example #1
atomic<int>
A(0);
atomic<int>
B(0);
Thread 1 Thread 2
A.store(1,
memory_order_relaxed);
B.store(1,
memory_order_relaxed);
c
=
B.load(memory_order_relaxed);
d
=
A.load(memory_order_relaxed);
c
d
You can prevent it with “full memory fences”: 0
0
Possible!
atomic_thread_fence(memory_order_seq_cst);
0
1
1
0
1
1
SEQUENTIALLY CONSISTENT ATOMICS
How to write them
atomic<int> A;
Operator overloading
SEQUENTIALLY CONSISTENT ATOMICS
Example #2
atomic<int>
A(0);
atomic<int>
B(0);
Thread 1 Thread 2
c
=
B;
two stores A
=
1;
d
=
A;
two loads
B
=
1;
Possible Interleavings: c
d
A
=
1;
A
=
1;
c
=
B;
0
0
B
=
1;
c
=
B;
d
=
A;
0
1
c
=
B;
B
=
1;
A
=
1;
Impossible!
1
0
d
=
A;
d
=
A;
B
=
1;
1
1
LOW-LEVEL ATOMICS
Example #2
atomic<int>
A(0);
atomic<int>
B(0);
Thread 1 Thread 2
A.store(1,
memory_order_relaxed);
c
=
B.load(memory_order_relaxed);
B.store(1,
memory_order_relaxed);
d
=
A.load(memory_order_relaxed);
c
d
This is the bug from Section One! 0
0
You can fix it with “lightweight fences”: 0
1
Possible!
atomic_thread_fence(memory_order_acquire);
1
0
atomic_thread_fence(memory_order_release);
1
1
VISUALIZING LOW-LEVEL ATOMICS
Imagine each thread having its own private copy of memory.
Thread 1 Thread 2
A
1
0
A
0
B
0
B
0
1
Now c = 0, d = 0 is trivial.
VISUALIZING LOW-LEVEL ATOMICS
This analogy corresponds to each CPU core having its own cache.
A
1
A
0
B
0
B
1
VISUALIZING LOW-LEVEL ATOMICS
Eventually, changes propagate between threads, but the timing is
unpredictable.
A
1
A
1
B
1
B
1
SEQUENTIALLY CONSISTENT ATOMICS
The magic compilers use to implement them
Load Store
x86/64 mov
%,
%
lock
xchg
%,
%
http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
HOW TO CONVERT GAME ATOMICS
to low-level C++11 atomics
Charles Bloom
Hans Boehm
Bruce Dawson
Hugo Allaire
Peter Dimov
Dominic Couture
Maurice Herlihy
Jean-François Dubé
Dominique Duvivier Paul McKenney
Michael Lavaire Peter Sewell
Jean-Sébastien Pelletier Herb Sutter
Anthony Williams
Rémi Quenin
Dmitry Vyukov
James Therien
Jeff Preshing
@preshing
[email protected]
Preshing on Programming
http://preshing.com/