0% found this document useful (0 votes)

18 views72 pages

How Ubisoft Montreal Develops Games For Multicore - Before and After C++11 - Jeff Preshing - CppCon 2014

Uploaded by

alan88w

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views72 pages

How Ubisoft Montreal Develops Games For Multicore - Before and After C++11 - Jeff Preshing - CppCon 2014

Uploaded by

alan88w

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

HOW UBISOFT MONTREAL

DEVELOPS GAMES FOR MULTICORE

Before & After C++11

Jeff Preshing
Technical Architect
Ubisoft Montreal
Watch Dogs Assassin’s Creed Unity

MULTICORE

Rainbow Six: Siege Far Cry 4

MULTICORE
several CPU cores on a single processor

§ Same instruction set

§ Same address space
§ Existing threads can run on any core

Popular way to offer power

Game C++
Industry Community

We all want to exploit multicore!

PART ONE PART TWO
Multicore Programming at Ubisoft The C++11 Atomic Library
Part One
Multicore Programming at Ubisoft
GENERAL-PURPOSE HARDWARE THREADS
available for game use

PlayStation 2 Xbox Xbox 360 PlayStation 3 PlayStation 4 Xbox One

1 MIPS 1 x86 6 PowerPC 2 PowerPC

+6 SPU
6 x64 6 x64
SINGLE-THREADED MAIN LOOP
in the early 2000s

Engine Graphics Engine Graphics

THREE THREADING PATTERNS
to exploit multicore

Pipelining Work

Dedicated Threads

Task Schedulers
CONCURRENT OBJECTS
Multiple threads operate concurrently on the object with at least one
thread modifying its state.

Pattern

Concurrent Object

Platform Atomic
primitives operations
Pipelining Work
PIPELINED GRAPHICS

Engine Engine Engine

Graphics Graphics Graphics

PIPELINED GRAPHICS

Engine Engine Engine

Semaphore Semaphore
Graphics Graphics Graphics
PIPELINED GRAPHICS
how to avoid concurrent modifications?

Engine

game objects

Graphics
DOUBLE-BUFFERED GRAPHICS STATE
One approach

struct Object
{
...
Matrix xform[2];
...
};
SEPARATE GRAPHIC OBJECTS
Another approach

struct Object struct GraphicObject

{ {
... ...
Matrix xform; Matrix xform;
GraphicObject* gfxObject; ...
... };
};

Copy at start of frame

Dedicated Threads
CONTENT STREAMING
We don’t load the entire game environment in memory at once.
DEDICATED LOADING THREAD

request
Loading
sleep sleep
Load chunk of world content
DEDICATED LOADING THREAD

Queue

Loading
WAKING UP THE LOADING THREAD
Loading Thread
ThreadSafeQueue<Request> requests; for (;;)
Event workAvailable; {
workAvailable.waitAndReset();
while (r = requests.tryPop())
{
Engine Thread processLoadRequest(r);
requests.push(r); }
workAvailable.signal(); }

Event signaled à threads pass through

Event reset à threads wait
IMPROVING ON THE QUEUE
Many design choices

Cancel requests
Custom Interrupt requests
Re-prioritize requests
Task Schedulers
FINE-GRAINED PARALLELISM
Motivation for a task scheduler

Input Logic Physics Animation

FINE-GRAINED PARALLELISM
Motivation for a task scheduler

Input Logic Physics Animation

FINE-GRAINED PARALLELISM
Motivation for a task scheduler

Worker

Worker
SIMPLE TASK QUEUE
Queue
WAKING UP THE WORKER THREADS
Worker Thread
ThreadSafeQueue<Task> tasks; for (;;)
Event workAvailable[numThreads]; {
workAvailable[thread].waitAndReset();
while (t = tasks.tryPop())
Submitting Thread {
tasks.push(t); t-‐>Run();
for (int i = 0; i < numThreads; i++) }
workAvailable.signal(); }

One event for each worker thread

TASK GROUPS
Grouping work units together into larger tasks

Input
class TaskGroup
Logic {
private:
Array<Item*> m_Items;
Physics ...
};

Animation

Each TaskGroup keeps an array of items to update in parallel.

TASK GROUPS
class TaskGroup
{
private:
vector<Item*> m_Items;
volatile int m_Index;

public:
void Run()
{
for (;;) m_Items
{
int index = AtomicIncrement(m_Index);
if (index >= m_Items.size())
break;
m_Items[index]-‐>Run();
}
}
};

Multiple threads work on the same TaskGroup.

NOT A SIMPLE QUEUE ANYMORE

tail 0

tails 1, 2, 3

head

Custom

Could be a queue with separate tails for each worker.

MANAGING DEPENDENCIES
Input

Logic

Physics

Animation

No physics tasks before all logic tasks.

MANAGING DEPENDENCIES
class TaskGroup
{
private:
vector<Item*> m_Items;
volatile int m_Index;
volatile int m_RemainingCount;
...

public: m_Items
void Run() {
int count = 0;
for (;;) {
int index = AtomicIncrement(m_Index);
if (index >= m_Items.size())
break;
m_Items[index]-‐>Run();
count++;
The thread that finishes the
} last item schedules the
if (count > 0 && AtomicSubtract(m_Index, count) == 0)
AddDependencies(); next TaskGroup.
}
};
IMPROVING ON THE TASK SCHEDULER
Many design choices

Centralized / per-thread task list

Priorities
Custom
Affinities
Batching
Profiler integration
“Pin” threads to cores
Atomic Operations
GAME ATOMICS
Typical portable library

Declaration volatile int A;

A = 1;
Load/Store int a = A;
LIGHTWEIGHT_FENCE();
Ordering
FULL_FENCE();
AtomicIncrement(A);
Read-Modify-Write AtomicCompareExchange(A, …, …);
...
FENCE MACROS
What’s the difference?

Used more often

LIGHTWEIGHT_FENCE(); FULL_FENCE();
... all that, plus:
Orders loads from memory Commits stores before next load
Orders stores to memory

Does the job of: Does the job of:

atomic_thread_fence(memory_order_acquire); atomic_thread_fence(memory_order_seq_cst);
atomic_thread_fence(memory_order_release);
atomic_thread_fence(memory_order_acq_rel);
HOW THEY’RE IMPLEMENTED
on processors we care about

x86/64 PowerPC ARMv7

Declaration volatile int A;
A = 1; mov %, % ld %, % ldr %, %
Load/Store int a = A; mov %, % st %, % str %, %
LIGHTWEIGHT_FENCE(); (compiler barrier) lwsync dmb
Ordering
FULL_FENCE(); mfence hwsync dmb
COMPILER_BARRIER();
AtomicIncrement(A); lock inc lwarx ldrex
Read-Modify-Write AtomicCompareExchange(A, …, …); lock cmpxchg ... ...
... stwcx strex
ATOMIC OPERATIONS
How we ended up using them

Pattern

Concurrent Object

Atomic
operations
Game
Industry
EXAMPLE
Capped wait-free queue

template <class T, int size>

class CappedSPSCQueue
{
private: m_writePos
T m_items[size];
volatile int m_writePos; m_items
int m_readPos;

public:
CappedSPSCQueue() : m_writePos(0), m_readPos(0) {} m_readPos
bool tryPush(const T& item) { ... }
bool tryPop(T& item) { ... }
};

Single producer, single consumer

EXAMPLE
Capped wait-free queue
m_writePos

m_readPos

bool tryPush(const T& item)

O KEN bool tryPop(T& item)
{
int w = m_writePos; BR {
int w = m_writePos;
if (w >= size) reorder if (m_readPos >= w)
return false; return false;
m_items[w] = item; item = m_items[m_readPos];
m_writePos = w + 1; m_readPos++;
reorder return true; return true;
} }
EXAMPLE
Capped wait-free queue
m_writePos

m_readPos

bool tryPush(const T& item) bool tryPop(T& item)

{ {
int w = m_writePos; int w = m_writePos;
if (w >= size) if (m_readPos >= w)
return false; return false;
m_items[w] = item; LIGHTWEIGHT_FENCE();
LIGHTWEIGHT_FENCE(); item = m_items[m_readPos];
m_writePos = w + 1; m_readPos++;
return true; return true;
} }
RECAP:
Multicore programming at Ubisoft

§ Three threading patterns

§ Lots of custom concurrent objects
§ Atomic operations for high contention objects
§ We learned by doing
Part Two
The C++11 Atomic Library
ATOMIC OPERATIONS
in C++11

Pattern

Concurrent Object

Atomic
operations

C++11 Portable principles

C++11 FORBIDS DATA RACES
If multiple threads access the same variable concurrently, and at least one
thread modifies it, all threads must use C++11 atomic operations.

Thread 1 Thread 2

int X;

OK!
C++11 FORBIDS DATA RACES
If multiple threads access the same variable concurrently, and at least one
thread modifies it, all threads must use C++11 atomic operations.

Thread 1 Thread 2

int X;
Ra c e! r
Datad Behavio
d efine
Un
C++11 FORBIDS DATA RACES
If multiple threads access the same variable concurrently, and at least one
thread modifies it, all threads must use C++11 atomic operations.

Thread 1 Thread 2

atomic<int> X;

OK!

§ That’s how you know when you must use atomic<>.

C++11 FORBIDS DATA RACES
One reason they’re bad

Thread 1 Thread 2
X = 0x80004; c = X;

If machine can int X; ...we get a "torn write".

only write 16 bits... 0x80000
...and this is 32-bit...
C++11 FORBIDS DATA RACES
If multiple threads access the same variable concurrently, and at least one
thread modifies it, all threads must use C++11 atomic operations.

Thread 1 Thread 2

volatile int X;

We break this rule all the time.

We know that int is atomic.
IT’S ACTUALLY TWO ATOMIC LIBRARIES
Masquerading under one API

Sequentially Consistent Atomics Low-Level Atomics

§ Similar to Java volatiles § Similar to C/C++ volatiles

§ Used in literature/books § Much like game atomics

All about interleaving statements

easier difficult

slower faster
SEQUENTIALLY CONSISTENT ATOMICS
Example #1

atomic<int> A(0);
atomic<int> B(0);

Thread 1 Thread 2
store, A = 1; B = 1; store,
then load c = B; d = A; then load

Possible Interleavings: c d
A = 1; A = 1; B = 1; 0 0 Impossible!
c = B; B = 1; d = A; 0 1
B = 1; c = B; A = 1;
1 0
d = A; d = A; c = B;
1 1
LOW-LEVEL ATOMICS
Example #1

atomic<int> A(0);
atomic<int> B(0);

Thread 1 Thread 2
A.store(1, memory_order_relaxed); B.store(1, memory_order_relaxed);
c = B.load(memory_order_relaxed); d = A.load(memory_order_relaxed);

Doing the same thing

c d
You can prevent it with “full memory fences”: 0 0 Possible!
atomic_thread_fence(memory_order_seq_cst); 0 1
1 0
1 1
SEQUENTIALLY CONSISTENT ATOMICS
How to write them

atomic<int> A;

A.store(1, memory_order_seq_cst); All other constraints are low-level

c = A.load(memory_order_seq_cst);

...is the same as:

A.store(1);
c = A.load();

Default argument ...and the same as:

A = 1;
c = A;

Operator overloading
SEQUENTIALLY CONSISTENT ATOMICS
Example #2

atomic<int> A(0);
atomic<int> B(0);

Thread 1 Thread 2
c = B;
two stores A = 1; d = A;
two loads
B = 1;

Possible Interleavings: c d
A = 1; A = 1; c = B; 0 0
B = 1; c = B; d = A; 0 1
c = B; B = 1; A = 1; Impossible!
1 0
d = A; d = A; B = 1;
1 1
LOW-LEVEL ATOMICS
Example #2

atomic<int> A(0);
atomic<int> B(0);

Thread 1 Thread 2
A.store(1, memory_order_relaxed); c = B.load(memory_order_relaxed);
B.store(1, memory_order_relaxed); d = A.load(memory_order_relaxed);

Doing the same thing

c d
This is the bug from Section One! 0 0
You can fix it with “lightweight fences”: 0 1 Possible!
atomic_thread_fence(memory_order_acquire); 1 0
atomic_thread_fence(memory_order_release); 1 1
VISUALIZING LOW-LEVEL ATOMICS
Imagine each thread having its own private copy of memory.

Thread 1 Thread 2
A 1
0 A 0
B 0 B 0
1

A.store(1, memory_order_relaxed); B.store(1, memory_order_relaxed);

c = B.load(memory_order_relaxed); d = A.load(memory_order_relaxed);

Now c = 0, d = 0 is trivial.
VISUALIZING LOW-LEVEL ATOMICS
This analogy corresponds to each CPU core having its own cache.

A 1 A 0
B 0 B 1
VISUALIZING LOW-LEVEL ATOMICS
Eventually, changes propagate between threads, but the timing is
unpredictable.

A 1 A 1
B 1 B 1
SEQUENTIALLY CONSISTENT ATOMICS
The magic compilers use to implement them

Load Store
x86/64 mov %, % lock xchg %, %

PowerPC hwsync hwsync

ld %, % st %, %
cmp %, 0
bc #
isync

ARMv7 ldr %, % dmb

dmb str %, %
dmb

ARMv8 ldar %, % stlr %, %

Itanium ld.acq %, % st.rel %, %

http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
HOW TO CONVERT GAME ATOMICS
to low-level C++11 atomics

Game Atomics Low-Level C++11 Atomics

volatile int A; atomic<int> A;
A = 1; A.store(1, memory_order_relaxed);
int a = A; int a = A.load(memory_order_relaxed);
LIGHTWEIGHT_FENCE(); atomic_thread_fence(memory_order_acquire/release);
FULL_FENCE(); atomic_thread_fence(memory_order_seq_cst);
AtomicIncrement(A); A.fetch_add(1, memory_order_relaxed);
AtomicCompareExchange(A, …, …); A.compare_exchange_strong(…, …, memory_order_relaxed);
... ...
EXAMPLE
Capped wait-free queue in C++11

template <class T, int size>

class CappedSPSCQueue
{
private: m_writePos
T m_items[size];
atomic<int> m_writePos; m_items
int m_readPos;

public:
CappedSPSCQueue() : m_writePos(0), m_readPos(0) {} m_readPos
bool tryPush(const T& item) { ... }
bool tryPop(T& item) { ... }
};

EXAMPLE
Low-level with standalone fences
bool tryPush(const T& item)
{
int w = m_writePos.load(memory_order_relaxed);
if (w >= size) Could even be non-atomic
return false;
m_items[w] = item;
atomic_thread_fence(memory_order_release);
m_writePos.store(w + 1, memory_order_relaxed);
return true;
}
bool tryPop(T& item)
{
int w = m_writePos.load(memory_order_relaxed);
if (m_readPos >= w)
return false;
atomic_thread_fence(memory_order_acquire);
item = m_items[m_readPos];
m_readPos++;
return true;
}
EXAMPLE
Low-level with standalone fences
bool tryPush(const T& item)
{ When this load sees the value
int w = m_writePos.load(memory_order_relaxed); written by this store...
if (w >= size)
return false;
m_items[w] = item;
atomic_thread_fence(memory_order_release);
m_writePos.store(w + 1, memory_order_relaxed);
return true;
}
bool tryPop(T& item)
{
int w = m_writePos.load(memory_order_relaxed);
if (m_readPos >= w)
... the fences synchronize- return false;
with each other (§29.8.2, atomic_thread_fence(memory_order_acquire);
N3337). item = m_items[m_readPos];
m_readPos++;
return true;
}
EXAMPLE
Low-level ordering constraints
bool tryPush(const T& item)
{ When the load sees the value
int w = m_writePos.load(memory_order_relaxed); written by the store...
if (w >= size)
return false;
m_items[w] = item;
m_writePos.store(w + 1, memory_order_release);
return true;
}

bool tryPop(T& item)

{
int w = m_writePos.load(memory_order_acquire);
... the store synchronizes- if (m_readPos >= w)
return false;
with the load (§29.3.2). item = m_items[m_readPos];
m_readPos++;
return true;
}
EXAMPLE
Using sequentially consistent atomics
bool tryPush(const T& item)
{ When the load reads from the
int w = m_writePos; store, they synchronize-with
if (w >= size)
return false;
each other (§29.3.1).
m_items[w] = item;
m_writePos = w + 1;
return true;
}

bool tryPop(T& item)

{
int w = m_writePos;
if (m_readPos >= w)
return false;
item = m_items[m_readPos];
m_readPos++;
return true;
}
EXAMPLE
Capped wait-free queue in C++11

template <class T, int size>

class CappedSPSCQueue
{
private:
T m_items[size]; All other variables can
atomic<int> m_writePos;
alignas(64) int m_readPos; remain non-atomic
because there is no data
public: race.
CappedSPSCQueue() : m_writePos(0), m_readPos(0) {}
bool tryPush(const T& item) { ... }
bool tryPop(T& item) { ... }
};

BENCHMARKS
nanoseconds per operation
Intel Core-i7 Quad-core / 2.3 GHz / Xcode 5.1.1 Release
0 10 20 30 40 50 60 70 80 90 100

Standalone fences Push alone

Pop alone
Low-level constraints Concurrent push
Concurrent pop
Sequentially consistent

ARM Cortex-A9 Dual-core / 800 MHz / Xcode 5.1.1 Release

0 10 20 30 40 50 60 70 80 90 100

Standalone fences Push alone

Pop alone
Low-level constraints Concurrent push
Concurrent pop
Sequentially consistent
RECAP:
The C++11 Atomic Library

§ C++11 forbids “data races”

§ Two atomic libraries
§ Pass non-atomic information by synchronizing-with
THANKS

Charles Bloom
Hans Boehm
Bruce Dawson
Hugo Allaire
Peter Dimov
Dominic Couture
Maurice Herlihy
Jean-François Dubé
Dominique Duvivier Paul McKenney
Michael Lavaire Peter Sewell
Jean-Sébastien Pelletier Herb Sutter
Anthony Williams
Rémi Quenin
Dmitry Vyukov
James Therien
Jeff Preshing
@preshing
[email protected]

Preshing on Programming
http://preshing.com/

Advanced Performance Optimization in CUDA (S62192)
No ratings yet
Advanced Performance Optimization in CUDA (S62192)
127 pages
4 Threads and Concurrency
No ratings yet
4 Threads and Concurrency
62 pages
Work Contracts
No ratings yet
Work Contracts
142 pages
Omp Hands On
No ratings yet
Omp Hands On
200 pages
GDC 2019 - Breaking Down Barriers (Public)
No ratings yet
GDC 2019 - Breaking Down Barriers (Public)
94 pages
CSE211 Computer Architecture
No ratings yet
CSE211 Computer Architecture
18 pages
Con Currency
No ratings yet
Con Currency
93 pages
(并行课件w3) 第2讲 1&2
No ratings yet
(并行课件w3) 第2讲 1&2
143 pages
Lecture4 CUDA Threads Part2
No ratings yet
Lecture4 CUDA Threads Part2
15 pages
Parallelizing The Naughty Dog Engine Using Fibers
No ratings yet
Parallelizing The Naughty Dog Engine Using Fibers
94 pages
LM32 Ait L22
No ratings yet
LM32 Ait L22
20 pages
06-CUDA Thread Organization
No ratings yet
06-CUDA Thread Organization
27 pages
Sans 10292
No ratings yet
Sans 10292
31 pages
Design For Performance
100% (1)
Design For Performance
34 pages
CSED405 Lec5-Threads and Atomics - 240921 - 193053
No ratings yet
CSED405 Lec5-Threads and Atomics - 240921 - 193053
34 pages
Threads & Concurrency: Lecture 23 - CS2110 - Fall 2018
No ratings yet
Threads & Concurrency: Lecture 23 - CS2110 - Fall 2018
34 pages
Lecture 25
No ratings yet
Lecture 25
41 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
OMG, Multi-Threading Is Easier Than Networking: White Paper
100% (1)
OMG, Multi-Threading Is Easier Than Networking: White Paper
10 pages
Unit 1 Modern Processors
No ratings yet
Unit 1 Modern Processors
52 pages
Arallel Rocessing NIT
No ratings yet
Arallel Rocessing NIT
58 pages
DigitalLogic ComputerOrganization L23 Multicore Handout
No ratings yet
DigitalLogic ComputerOrganization L23 Multicore Handout
32 pages
Fork Join Parallelism
No ratings yet
Fork Join Parallelism
30 pages
Back To Basics Concurrency Arthur Odwyer
No ratings yet
Back To Basics Concurrency Arthur Odwyer
58 pages
GPU Computing With CUDA Lecture 3 - Efficient Shared Memory Use
No ratings yet
GPU Computing With CUDA Lecture 3 - Efficient Shared Memory Use
52 pages
Slides - Chapter 6
No ratings yet
Slides - Chapter 6
59 pages
Embedded Systems Practices
No ratings yet
Embedded Systems Practices
16 pages
Unit 1
No ratings yet
Unit 1
11 pages
Synchronization
No ratings yet
Synchronization
81 pages
Summary Exam 2015
No ratings yet
Summary Exam 2015
30 pages
Lecture-6 Synchronization
No ratings yet
Lecture-6 Synchronization
55 pages
CS 179: GPU Programming: Lecture 5: Gpu Compute Architecture
No ratings yet
CS 179: GPU Programming: Lecture 5: Gpu Compute Architecture
17 pages
Parallel Regions: Compute
No ratings yet
Parallel Regions: Compute
8 pages
Lect11 12 Cuda Threads
No ratings yet
Lect11 12 Cuda Threads
25 pages
08 Systems Programming-Concurrent Programming
No ratings yet
08 Systems Programming-Concurrent Programming
61 pages
OpenCL Tutorial - Basics
No ratings yet
OpenCL Tutorial - Basics
24 pages
Comp422 2011 Lecture8 UPC
No ratings yet
Comp422 2011 Lecture8 UPC
44 pages
CCA - Module 3 - Concurrent Computing
No ratings yet
CCA - Module 3 - Concurrent Computing
49 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
Unit 4
No ratings yet
Unit 4
42 pages
Locks 1
No ratings yet
Locks 1
61 pages
TinyOS Nesc PDF
No ratings yet
TinyOS Nesc PDF
29 pages
Lecture 16
No ratings yet
Lecture 16
30 pages
Intro To OpenMP Mattson Customized
No ratings yet
Intro To OpenMP Mattson Customized
94 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
Byt BZB 1 8 Tsfi Eng PDF
No ratings yet
Byt BZB 1 8 Tsfi Eng PDF
51 pages
Parallel Computing Platforms-Dr Nausheen
No ratings yet
Parallel Computing Platforms-Dr Nausheen
47 pages
05 C++ Threads
No ratings yet
05 C++ Threads
28 pages
Cheat Final
No ratings yet
Cheat Final
4 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
Gpu Cuda 2
No ratings yet
Gpu Cuda 2
72 pages
Module 2
No ratings yet
Module 2
127 pages
514 614 L28 32H Fuel Oil System
100% (1)
514 614 L28 32H Fuel Oil System
30 pages
ECE 4100/6100 Advanced Computer Architecture: Lecture 13 Multithreading and Multicore Processors
No ratings yet
ECE 4100/6100 Advanced Computer Architecture: Lecture 13 Multithreading and Multicore Processors
56 pages
What Every Systems Programmer Should Know About Concurrency: Matt Kline
No ratings yet
What Every Systems Programmer Should Know About Concurrency: Matt Kline
12 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Concurrency Primer
No ratings yet
Concurrency Primer
12 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
Using C++ To Connect To Web Services - Steve Gates - CppCon 2014
No ratings yet
Using C++ To Connect To Web Services - Steve Gates - CppCon 2014
40 pages
Distributed Computing Seminar
No ratings yet
Distributed Computing Seminar
37 pages
Lecture #10: Threads & Synchronization
No ratings yet
Lecture #10: Threads & Synchronization
7 pages
D8.1M 2007PV PDF
No ratings yet
D8.1M 2007PV PDF
5 pages
A Comparative Study of The Academic Performance of USL-SHS Students Before and During Pandemic I
100% (1)
A Comparative Study of The Academic Performance of USL-SHS Students Before and During Pandemic I
10 pages
Technology and Livelihood Education: Module 5 & 6
No ratings yet
Technology and Livelihood Education: Module 5 & 6
18 pages
GPU Architecture
No ratings yet
GPU Architecture
17 pages
Course Plan - Linux Lab
No ratings yet
Course Plan - Linux Lab
12 pages
Krushi Bhavan
No ratings yet
Krushi Bhavan
5 pages
Evaluation of Information Systems
No ratings yet
Evaluation of Information Systems
29 pages
C++ Metaprogramming - Fedor Pikus - CppCon 2015
100% (1)
C++ Metaprogramming - Fedor Pikus - CppCon 2015
76 pages
Functional Programming - Functors and Monads - Michał Dominiak - CppCon 2015
100% (1)
Functional Programming - Functors and Monads - Michał Dominiak - CppCon 2015
19 pages
Algorithmic Differentiation - C++ and Extremum Estimation - Matt P. Dziubinski - CppCon 2015
No ratings yet
Algorithmic Differentiation - C++ and Extremum Estimation - Matt P. Dziubinski - CppCon 2015
283 pages
RCPP - Seamless R and C++ Integration - Matt P. Dziubinski - CppCon 2015
No ratings yet
RCPP - Seamless R and C++ Integration - Matt P. Dziubinski - CppCon 2015
137 pages
Compile-Time Tools For Generic Programming in C++ - Abel Sinkovics - CppCon 2015
No ratings yet
Compile-Time Tools For Generic Programming in C++ - Abel Sinkovics - CppCon 2015
241 pages
Viewing The World Through Array-Shaped Glasses - Łukasz Mendakiewicz - CppCon 2014
No ratings yet
Viewing The World Through Array-Shaped Glasses - Łukasz Mendakiewicz - CppCon 2014
131 pages
The Canonical Class - Michael Caisse - CppCon 2014
No ratings yet
The Canonical Class - Michael Caisse - CppCon 2014
138 pages
Being Smart About Pointers - Michael VanLoon - CppCon 2015
No ratings yet
Being Smart About Pointers - Michael VanLoon - CppCon 2015
47 pages
Simple Extensible Pattern Matching With C++14 - John Bandela - CppCon 2015
No ratings yet
Simple Extensible Pattern Matching With C++14 - John Bandela - CppCon 2015
118 pages
STL Algorithms in Action - Michael VanLoon - CppCon 2015
No ratings yet
STL Algorithms in Action - Michael VanLoon - CppCon 2015
99 pages
Where Did My Performance Go - Fedor Pikus - CppCon 2014
No ratings yet
Where Did My Performance Go - Fedor Pikus - CppCon 2014
66 pages
From Functional To Parallel - Stochastic Modelling in C++ - Kevin Carpenter - CppCon 2015
No ratings yet
From Functional To Parallel - Stochastic Modelling in C++ - Kevin Carpenter - CppCon 2015
64 pages
C++ On The Web - JF Bastien - CppCon 2015
No ratings yet
C++ On The Web - JF Bastien - CppCon 2015
24 pages
Benchmarking C++ Code - Bryce Adelstein Lelbach - CppCon 2015
No ratings yet
Benchmarking C++ Code - Bryce Adelstein Lelbach - CppCon 2015
79 pages
EG-EM1 Manual
No ratings yet
EG-EM1 Manual
4 pages
C++11, 14, 17 Atomics - The Deep Dive - Michael Wong - CppCon 2015
No ratings yet
C++11, 14, 17 Atomics - The Deep Dive - Michael Wong - CppCon 2015
69 pages
Modernizing Legacy C++ Code - Gregory and McNellis - CppCon 2014
No ratings yet
Modernizing Legacy C++ Code - Gregory and McNellis - CppCon 2014
81 pages
Microprocessor in Agriculture
No ratings yet
Microprocessor in Agriculture
2 pages
The Birth of Study Group 14 - Nicolas Guillemot, Sean Middleditch, Michael Wong - CppCon 2015
No ratings yet
The Birth of Study Group 14 - Nicolas Guillemot, Sean Middleditch, Michael Wong - CppCon 2015
44 pages
C++ Multi-Dimensional Arrays For Computational Physics and Applied Mathematics - Pramod Gupta - CppCon 2015
No ratings yet
C++ Multi-Dimensional Arrays For Computational Physics and Applied Mathematics - Pramod Gupta - CppCon 2015
43 pages
Rebuilding Boost Date-Time For C++11 - Jeff Garland - CppCon 2014
No ratings yet
Rebuilding Boost Date-Time For C++11 - Jeff Garland - CppCon 2014
56 pages
Types Don't Know # - Howard Hinnant - CppCon 2014
No ratings yet
Types Don't Know # - Howard Hinnant - CppCon 2014
95 pages
Reactive Stream Processing Rx4DDS - Sumant Tambe - CppCon 2015
No ratings yet
Reactive Stream Processing Rx4DDS - Sumant Tambe - CppCon 2015
51 pages
STL Features and Implementation Techniques - Stephan T. Lavavej - CppCon 2014
No ratings yet
STL Features and Implementation Techniques - Stephan T. Lavavej - CppCon 2014
47 pages
The Implementation of Value Types - Lawrence Crowl - CppCon 2014
No ratings yet
The Implementation of Value Types - Lawrence Crowl - CppCon 2014
71 pages
QT - Modern User Interfaces For C++ - Milian Wolff - CppCon 2015
No ratings yet
QT - Modern User Interfaces For C++ - Milian Wolff - CppCon 2015
43 pages
Contracts For Dependable C++ - Gabriel Dos Reis - CppCon 2015
No ratings yet
Contracts For Dependable C++ - Gabriel Dos Reis - CppCon 2015
35 pages
Functional Design Explained - David Sankel - CppCon 2015
No ratings yet
Functional Design Explained - David Sankel - CppCon 2015
43 pages
Easy Compilation From TouchDevelop To ARM Cortex-M0 Using C++11 - Jonathan Protzenko - CppCon 2015
No ratings yet
Easy Compilation From TouchDevelop To ARM Cortex-M0 Using C++11 - Jonathan Protzenko - CppCon 2015
20 pages
Introducing Brigand - Edouard Alligand and Joel Falcou - CppCon 2015
No ratings yet
Introducing Brigand - Edouard Alligand and Joel Falcou - CppCon 2015
9 pages
Computer Technology An Introduction
No ratings yet
Computer Technology An Introduction
184 pages
ALL Boolean Algebra
No ratings yet
ALL Boolean Algebra
25 pages
9852 2340 01b Manual Cement Unit Boltec M & L RCS 4.5
No ratings yet
9852 2340 01b Manual Cement Unit Boltec M & L RCS 4.5
56 pages
C++ in The Telecom Industry - Yani Miguel - CppCon 2015
No ratings yet
C++ in The Telecom Industry - Yani Miguel - CppCon 2015
13 pages
Acer Huaqin NX8102 Alien GL v1.0
No ratings yet
Acer Huaqin NX8102 Alien GL v1.0
53 pages
Characteristics of P-Channel SOI LDMOS Transistor With Tapered Field Oxides
No ratings yet
Characteristics of P-Channel SOI LDMOS Transistor With Tapered Field Oxides
7 pages
2022 Construction Estimating Pricing Guide
No ratings yet
2022 Construction Estimating Pricing Guide
13 pages
IHHA sts2011 - Turner
No ratings yet
IHHA sts2011 - Turner
9 pages
Hepa Filters 01
No ratings yet
Hepa Filters 01
1 page
Radware DefensePro Imp Points
No ratings yet
Radware DefensePro Imp Points
3 pages
6 BSTs and AVL Trees
No ratings yet
6 BSTs and AVL Trees
12 pages
TRIO 330: The Framed Panel Formwork For Heights of Up To 3.30 M
No ratings yet
TRIO 330: The Framed Panel Formwork For Heights of Up To 3.30 M
8 pages
HP-MP: Compact Pulverizing Mill and Pellet Press
No ratings yet
HP-MP: Compact Pulverizing Mill and Pellet Press
6 pages
EEPROM Cross Reference (In Detail)
No ratings yet
EEPROM Cross Reference (In Detail)
11 pages
WM412C.1-V1.1-1.2 Main Vertical Sections
No ratings yet
WM412C.1-V1.1-1.2 Main Vertical Sections
1 page
Unikl Bmi: Section A: Course Details
No ratings yet
Unikl Bmi: Section A: Course Details
4 pages
SDO Animo Year End 2020-2021 - GBB - Lopez
No ratings yet
SDO Animo Year End 2020-2021 - GBB - Lopez
2 pages
Installation Instructions Model HTRI-M: Addressable Interface Module
No ratings yet
Installation Instructions Model HTRI-M: Addressable Interface Module
2 pages
Cylinder Form
No ratings yet
Cylinder Form
1 page
Department of Civil Engineering (Bbit B.Tech Wing) A.Y. 2019-2020 Even Semester Faculty Database For Online Internal Exam in May 2020
No ratings yet
Department of Civil Engineering (Bbit B.Tech Wing) A.Y. 2019-2020 Even Semester Faculty Database For Online Internal Exam in May 2020
1 page
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)

How Ubisoft Montreal Develops Games For Multicore - Before and After C++11 - Jeff Preshing - CppCon 2014

Uploaded by

How Ubisoft Montreal Develops Games For Multicore - Before and After C++11 - Jeff Preshing - CppCon 2014

Uploaded by

HOW UBISOFT MONTREAL

DEVELOPS GAMES FOR MULTICORE

Rainbow Six: Siege Far Cry 4

§ Same instruction set

Popular way to offer power

We all want to exploit multicore!

PlayStation 2 Xbox Xbox 360 PlayStation 3 PlayStation 4 Xbox One

1 MIPS 1 x86 6 PowerPC 2 PowerPC

Engine Graphics Engine Graphics

Engine Engine Engine

Graphics Graphics Graphics

Engine Engine Engine

struct Object struct GraphicObject

Copy at start of frame

Event signaled à threads pass through

Input Logic Physics Animation

Input Logic Physics Animation

One event for each worker thread

Each TaskGroup keeps an array of items to update in parallel.

Multiple threads work on the same TaskGroup.

Could be a queue with separate tails for each worker.

No physics tasks before all logic tasks.

Centralized / per-thread task list

Declaration volatile int A;

Used more often

Does the job of: Does the job of:

x86/64 PowerPC ARMv7

template <class T, int size>

Single producer, single consumer

bool tryPush(const T& item)

bool tryPush(const T& item) bool tryPop(T& item)

§ Three threading patterns

C++11 Portable principles

§ That’s how you know when you must use atomic<>.

If machine can int X; ...we get a "torn write".

volatile int X;

We break this rule all the time.

Sequentially Consistent Atomics Low-Level Atomics

§ Similar to Java volatiles § Similar to C/C++ volatiles

All about interleaving statements

Doing the same thing

A.store(1, memory_order_seq_cst); All other constraints are low-level

...is the same as:

Default argument ...and the same as:

Doing the same thing

A.store(1, memory_order_relaxed); B.store(1, memory_order_relaxed);

PowerPC hwsync hwsync

ARMv7 ldr %, % dmb

ARMv8 ldar %, % stlr %, %

Itanium ld.acq %, % st.rel %, %

Game Atomics Low-Level C++11 Atomics

template <class T, int size>

bool tryPop(T& item)

bool tryPop(T& item)

template <class T, int size>

Standalone fences Push alone

ARM Cortex-A9 Dual-core / 800 MHz / Xcode 5.1.1 Release

Standalone fences Push alone

§ C++11 forbids “data races”

You might also like