0% found this document useful (0 votes)

9 views

Slides

The document discusses techniques for developing low latency C++ applications for electronic trading, including avoiding error handling in the hot path, using templates to remove branches, preallocating memory to avoid allocations, leveraging exceptions without cost, reducing branches, avoiding multithreading, and optimizing data lookups.

Uploaded by

enkinil02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Slides

Uploaded by

enkinil02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

Low Latency C++ for Fun and

Profit
Carl Cook, Ph.D.

@ProgrammerCarl
[email protected]

1
Introduction

About me:
● Freelance software developer
● Experience is with trading companies (mainly)
● A member of ISO SG14 (gaming, low latency, trading)

Contents:
● A 30 second introduction to trading
● Performance techniques for low latency, and then some surprises
● Measurement of performance

Disclaimer: This is not a general discussion on every C++ optimization technique -

it’s a quick sampler into the life of developing high performance trading systems

2
What is electronic trading/HFT/market making/algo trading?

3
while (true) {
try_buy_low();
try_sell_high();
}

4
Why the need for speed?

Electronic market makers aim for the lowest latency possible:

● Fast reaction to market events

○ Allowing now out-of-date orders to be adjusted (before losing money)
○ To be the first to spot a favorable order and try to trade with this

5
Solving this challenge has some nice spin-offs to other industries:

● More efficient code: longer battery life/drone flight time/power savings

● Faster/more responsive autonomous vehicles
● Better general application performance
● Continually improving hardware
● …

6
C++ in finance
Source:
Jetbrains

7
Technical challenges of low latency
trading

“If you’re not at all interested in performance, shouldn’t you be in the Python
room down the hall?”
– Scott Meyers

8
The ‘Hotpath’

● The “hotpath” is only exercised 0.01% of the time - the rest of the time, the
system is idle, or doing administrative work
● Operating systems, networks and hardware are focused on throughput and
fairness
● Jitter is unacceptable - it means bad trades
● A lot can go wrong in a few microseconds

9
Execution time is a limited resource

If the target is 3.5us wire to wire (for example), then:

● 1us for RX of market data message from exchange
● 1us for TX of order message to exchange
● Maybe 0.5us of misc IPC, and jitter that’s hard to get rid of
● Leaves approximately 1us for the actual trading code
○ Arguably around 3K CPU cycles/12K instructions
○ But think about memory latency, pipeline stalls, cache misses, etc

RX IPC { 1 us TX

{Time it takes for light to travel 300 metres}

10
The role of C++

From Bjarne Stroustrup:

“C++ enables zero-overhead abstraction to get us away from the hardware
without adding cost”

But: even though C++ is good at saying what will be done, there are other factors:
● Compiler (and version)
● Machine architecture
● 3rd party libraries
● Build and link flags

We need to check what C++ is doing in terms of machine instructions...

11
… luckily there’s an app for that:

12
The importance of system tuning (results on the next page)

std::vector<int> items;
items.reserve(1024);

void SortVector(benchmark::State& state) {

for (auto _ : state) {
const auto N = state.range(0);
items.resize(N);
for (int i = 0; i < N; ++i)
items[i] = rand() % N;
std::sort(items.begin(), items.end());
}
}

BENCHMARK(Sort)->Range(8, 1024);

13
Same:
● Hardware
● Operating system
● Binary
● Background load

One server is tuned for

production (no hyper
threading, etc), the
other not

14
Low latency programming techniques

"When in doubt, use brute force."

– Ken Thompson

15
Slowpath removal

Avoid this: Aim for this:

if (checkForErrorA()) int64_t errorFlags;

handleErrorA(); ...
else if (checkForErrorB()) if (!errorFlags)
handleErrorB(); sendOrderToExchange();
else if (checkForErrorC()) else
handleErrorC(); HandleError(errorFlags);
else
sendOrderToExchange();

Tip: ensure that error handling code will not be inlined

16
Template-based configuration

● It’s convenient to have some things controlled via configuration files

○ However virtual functions (and even simple branches) can be expensive
● One possible solution:
○ Use templates (often overlooked, even though everyone uses the STL)
○ This removes branches, eliminates code that won’t be executed, etc

17
// 1st implementation // 2nd implementation
struct OrderSenderA { struct OrderSenderB {
void SendOrder() { void SendOrder() {
... ...
} }
}; };

template <typename T>

struct OrderManager : public IOrderManager {
void MainLoop() final {
// ... and at some stage in the future...
mOrderSender.SendOrder();
}
T mOrderSender;
};

18
std::unique_ptr<IOrderManager> Factory(const Config& config) {
if (config.UseOrderSenderA())
return std::make_unique<OrderManager<OrderSenderA>>();
else if (config.UseOrderSenderB())
return std::make_unique<OrderManager<OrderSenderB>>();
else
throw;
}

int main(int argc, char *argv[]) {

auto manager = Factory(config);
manager->MainLoop();
}

19
Memory allocation

● Allocations are of course costly:

○ Use a pool of preallocated objects
○ Reuse objects instead of deallocating:
■ delete involves no system calls (memory is not given back to the OS)
● But: glibc free has 400 lines of book-keeping code
■ Reusing objects helps avoid memory fragmentation as well
● If you must delete large objects, consider doing this from another thread
● Be aware that destructors may be inlined
○ This can start trampling your instruction cache

20
Exceptions in C++

● Don’t be afraid to use exceptions (if using gcc, clang, msvc):

○ I’ve measured this in quite some detail:
■ They are basically zero cost if they don’t throw
■ Maybe some slight code reordering, but the cost is negligible

● Don’t use exceptions for control flow:

○ That will get expensive:
■ My benchmarking suggests an overhead of at least 1.5us
○ Your code will look terrible

21
Branch reduction

Branching approach:

enum class Side { Buy, Sell };

void RunLogic(Side side) {

const float orderPrice = CalcPrice(side, fairValue, credit);
CheckRiskLimits(side, orderPrice);
SendOrder(side, orderPrice);
}

float CalcPrice(Side side, float value, float credit) {

return side == Side::Buy ? value - credit : value + credit;
}

22
Templated approach:

template<>
void RunLogic<Side::Buy>() {
float orderPrice = CalcPrice<Side::Buy>(fairValue, credit);
CheckRiskLimits<Side::Buy>(orderPrice);
SendOrder<Side::Buy>(orderPrice);
}
template<>
float CalcPrice<Side::Buy>(float value, float credit) {
return value - credit;
}
template<>
float CalcPrice<Side::Sell>(float value, float credit) {
return value + credit;
}

23
Multi-threading

Multithreading is best avoided for

latency-sensitive code:
● Synchronization of data via locking
is going to be expensive
● Lock free code may still require
locks at the hardware level
● Mind-bendingly complex to
correctly implement parallelism
● Easy for the producer to
accidentally saturate the consumer

24
If you must use multiple threads...

● Keep shared data to an absolute minimum

○ Multiple threads writing to the same cacheline will get expensive
● Consider passing copies of data rather than sharing data
○ E.g. a single writer, single reader lock free queue
● If you have to share data, consider not using synchronization, i.e.:
○ Maybe you can live with out-of-sequence updates
○ Maybe the machine architecture prevents torn reads/writes, preserves
ordering of stores and loads (etc)

25
Data lookups

The software engineering textbooks would typically suggest:

struct Market { struct Instrument {

int32_t id; float price;
char shortName[4]; int32_t marketId;
int16_t quantityMultiplier; ...
... }
}

Message orderMessage;
orderMessage.price = instrument.price;
Market& market = Markets.FindMarket(instrument.marketId);
orderMessage.qty = market.quantityMultiplier * qty;
...

26
Actually, denormalized data is not a sin:
● Chances are there is space in the cacheline that you read to have pulled in the
extra field, avoiding an additional lookup

struct Market { struct Instrument {

int32_t id; float price;
char shortName[4]; int16_t quantityMultiplier;
int16_t quantityMultiplier; ...
...
} }

This is better than trampling your cache to “save memory”

27
Fast associative containers (std::unordered_map)

Bucket 1 Bucket ... Bucket N

Key Value Key Value Key Value

{
Key Value std::pair<K, V> Key Value

Key Value
Default max_load_factor: 1
Average case insert: O(1) See: N1456
Average case find: O(1)
28
10K elements, keyed in the range std::uniform_int_distribution(0, 1e+12)

Complexity of find:

Average case: O(1)

Worst case: O(N)

29
Run on (32 X 2892.9 MHz CPU s), 2017-09-08 11:39:44
Benchmark Time
----------------------------------------------
FindBenchmark<unordered_map>/10 14 ns
FindBenchmark<unordered_map>/64 16 ns
FindBenchmark<unordered_map>/512 16 ns
FindBenchmark<unordered_map>/4k 20 ns
FindBenchmark<unordered_map>/10k 24 ns
----------------------------------------------

# 56.54% frontend cycles idle

# 21.61% backend cycles idle
# 0.67 insns per cycle
# 0.84 stalled cycles per insn
branch-misses # 0.63% of all branches
cache-misses # 0.153% of all cache refs
30
Alternatively, consider open addressing, e.g. google’s dense_hash_map

Key Value Key Value Key Value

✓ Key/Value pairs are in contiguous memory - no pointer following between nodes

✘ Complexity around collision management

31
A lesser-known approach: a hybrid of both chaining and open addressing

Goals:
● Minimal memory footprint
● Predictable cache access patterns (no jumping all over the place)

32
Key ➔ Hash ➔ Index Key ➔ Hash ➔ Index

✓ ✘ ✓
Hash Ptr Hash Ptr Hash Ptr

✓
Key Value

Key Value

✓
Key Value

It’s possible to implement this as a drop-in substitute for std::unordered_map

33
Run on (32 X 2892.9 MHz CPU s), 2017-09-08 11:40:08
Benchmark Time
----------------------------------------------
FindBenchmark<array_map>/10 7 ns
FindBenchmark<array_map>/64 7 ns
FindBenchmark<array_map>/512 7 ns
FindBenchmark<array_map>/4k 9 ns
FindBenchmark<array_map>/10k 9 ns
----------------------------------------------

# 38.26% frontend cycles idle

# 6.77% backend cycles idle
# 1.6 insns per cycle
# 0.24 stalled cycles per insn
branch-misses # 0.22% of all branches
cache-misses # 0.067% of all cache refs

34
Branch prediction hints

#define likely(x) __builtin_expect((x),1)

#define unlikely(x) __builtin_expect((x),0)

● You may recognise these from the linux kernel source

● The compiler often picks the right case in the first place, but there’s no
guarantee

35
gcc with no hints main:
cmp edi, 1 // argc
int GetErrorCode() {
jle .L7
return rand() % 255 + 1;
sub rsp, 8
}
call rand
mov ecx, 255
int main(int argc, char**) {
cdq
if (argc > 1)
idiv ecx
return GetErrorCode();
lea eax, [rdx+1]
else
pop rdx
return 0;
ret
}
.L7:
xor eax, eax // zeros ebx
ret
36
Now with branch prediction hints main:
cmp edi, 1
jg .L12
int GetErrorCode() { xor eax, eax
return rand() % 255 + 1; ret
} .L12:
sub rsp, 8
int main(int argc, char**) { call rand
if (unlikely(argc > 1)) mov ecx, 255
return GetErrorCode(); cdq
else idiv ecx
return 0; lea eax, [rdx+1]
} pop rdx
ret
37
● These “likely” attributes are useful if something called very rarely needs to be
fast when called (i.e. expect more efficient assembly code to be generated)
● In all other cases:
○ Write your code to avoid branches, and
○ Train the hardware branch predictor (more about this later)
■ This is the dominant factor

See https://wg21.link/P0479 for a proposal to standardize these attributes

See https://groups.google.com/a/isocpp.org/forum/#!forum/sg14 for a lively debate

on this proposal

38
((always_inline)) and ((noinline))

● ((always_inline)) and ((noinline)) can be useful

○ Means: inlining is preferred/inlining should be avoided
○ But be careful: measure
● Please note that the inline keyword is not really what you are looking for
○ Mainly means: multiple definitions are permitted

A quick example: forcing a method to be not inlined (for good reason)

CheckMarket(); __attribute__((noinline))
if (notGoingToSendAnOrder) void ComplexLoggingFunction()
ComplexLoggingFunction(); {
else ...
SendOrder(); }

39
Default gcc generated code

void get_error_code() { ... } get_error_code:

...
int main(int argc, char**) { ret
if (argc > 1) main:
return get_error_code(); cmp edi, 1 // argc register
else jle .L6
return 0; jmp get_error_code
} .L6:
xor eax, eax // zeros eax
ret // eax is the ret val

40
Forcing get_error_code to be inlined

__attribute__((always_inline)) main:
void get_error_code() { ... } cmp edi, 1
jle .L6
int main(int argc, char**) { get_error_code instruction 1
if (argc > 1) get_error_code instruction ..
return get_error_code(); get_error_code instruction N
else mov eax, [error code]
return 0; ret
} .L6:
xor ebx, ebx // zeros ebx
ret

41
Combining inlining hints and branch prediction hints

Combining noinline with “unlikely” branch prediction

__attribute__((noinline)) get_error_code:
void get_error_code() { ... } ...
ret
int main(int argc, char**) { main:
if (unlikely(argc > 1)) cmp edi, 1
return get_error_code(); jg .L7
else xor eax, eax
return 0; ret
} .L7:
jmp get_error_code

42
Other gcc compiler hints for cache locality

__attribute__((hot)):

Puts all functions into a single section in the binary, including ancestor functions

__attribute__((cold)):

Puts functions into a different section (and will avoid inlining)

This is somewhat useful - basically does the same as inlining of hot functions and
no-inlining of cold functions

43
Prefetching

__builtin_prefetch can also be useful (if you know that the hardware branch
predictor won’t be able to work out the right pattern)

Example (of a binary search loop):

// next mid val after this iteration if we take the low path
__builtin_prefetch(&array[(low + mid - 1)/2]);
// next mid val after this iteration if we take the high path
__builtin_prefetch(&array[(mid + 1 + high)/2]);

int mid = (low + high) / 2;

if (array[mid] == key) return mid;
if (array[mid] < key) low = mid + 1; // search high path
else high = mid - 1; // search low path

Bonus: you can also prefetch the instruction cache

44
Compiler attributes <TL/DR>

Pick one:

● Code with no (or minimal) branches

● __attribute__((always_inline)) and __attribute__((noinline))
● __builtin_expect()
● __attribute__((hot)) and __attribute__((cold))
● __builtin_prefetch()

Usually you will see no further gain if you apply several of the above

45
Keeping the caches hot - a better way!

Remember, the full hotpath is only exercised very infrequently - your cache has
most likely been trampled by non-hotpath data and instructions

Market data Strategy Execution

decoder engine

Market data
decoder

Market data Strategy

decoder

Market data
decoder

Market data Strategy Execution

decoder engine

Market data
decoder
46
A simple solution: run a very frequent pre-warm path through your entire system,
keeping both your data cache and instruction cache primed

Market data Strategy Execution

decoder engine

Market data Strategy Execution

decoder engine

Market data Strategy Execution

decoder engine

Market data Strategy Execution

decoder engine

Market data Strategy Execution

decoder engine

Market data Strategy Execution

decoder engine

Bonus: this also correctly trains the hardware branch predictor

47
Are
System running Ye Fix
pre-warming
5us slower than s pre-warming
messages
normal messages
broken?

No
You poor
bastard

Problem
No solved?

Ye
s
Done

48
Hardware/architecture considerations

Quick recap:
● A server can have N physical CPUs (one CPU attaches to one socket)
○ Each CPU can have N cores (ignoring hyperthreading per core)
■ Each core has a:
● L1 data cache (~32KB)
● L1 instruction cache (~32KB)
● Unified L2 cache (~512KB)
○ All cores share a unified L3 cache (~50Mb)

Source:
Intel Corporation

49
Intel Xeon E5 processor

Source:
Intel Corporation
50
● Don’t share L3 - disable all other cores (or lock the cache)
○ This might mean paying for 22 cores but only using 1
● Choose your neighbours carefully:
○ Noisy neighbours should probably be moved to a different physical CPU

51
Surprises and war stories

"I have always wished for my computer to be as easy to use as my telephone;

my wish has come true because I can no longer figure out how to use my
telephone."
– Bjarne Stroustrup

52
Small string optimization support
std::unordered_map<std::string, Instrument> instruments;
return instruments.find({“IBM”}) != instruments.end();

● This will only work:

○ With gcc 5.1 or greater, and if the string is 15 characters or less
○ In clang if the string is 22 characters or less

● In gcc, std::string has C.O.W. semantics (prior to gcc 5.1)

○ This gets expensive (during copying/destruction) due to atomics
○ First mentioned by Herb Sutter in 1999
● If you use a ABI compatible linux distribution such as
Redhat/Centos/Ubuntu/Fedora, then you are probably still using the old
std::string implementation (even with the latest versions of gcc):
○ C.O.W and no SSO support

53
std::string_view (to the rescue)

Provides allocation-free substrings and string literals

std::map<std::string, Instrument, std::less<>> instruments;

instruments.find(std::string_view{"FACEBOOK"})->second;

std::string name{"FACEBOOK"};
instruments.find(name.substr(1,3)); // "ACE"

Available in most C++17 compilers, and in C++14 as

std::experimental::string_view

54
Avoiding std::string (and allocations)

● Consider something like inplace_string:

○ No allocation, compile time bounds checking, and full std::string interface
○ https://github.com/david-grs/inplace_string

using InstrumentName = inplace_string<16>;

InstrumentName instrumentName {"IBM"};
assert(InstrumentName::npos == instrumentName.find("GOOGLE"));

● Implicitly convertible to std::string if required

std::string str{instrumentName};

● In production, with a sample size of 1024, inserting 6 elements into a vector

std::string min=918ns mean=3,003ns max=29,518ns
inplace_string<16> min= 28ns mean= 61ns max= 1,829ns

55
Userspace networking vs cache

● Userspace means we can receive data (prices, etc) without any system calls
● But there can be too much of a good thing:
○ All secondary data goes through the cache, even if we don’t use the data
○ When items go into the cache, other items are evicted

Order insert requests

Cache Core 1
Latency critical data Orders to exchange

Secondary data

Key
Userspace communication
Shared memory communication

56
Alternative setup:

Order insert requests Core 1

Cache
Orders to exchange
Latency critical data

Dequeued in batches, infrequently

Secondary data Core 2

Cache

Key
Userspace communication
Shared memory communication
Single writer/single reader lock free queue
57
Watch your enums and switches

enum Enum { Good, Bad, Ugly };

int main(int argc, char**) { main:

switch ((Enum)argc) { sub rsp, 8
case Good: Handle("GOOD"); test edi, edi
break; je .L8
case Bad: Handle("BAD"); cmp edi, 1
break; je .L3
case Ugly: Handle("UGLY"); cmp edi, 2
break; je .L4
}
}

58
Overhead of C++11 static local variable initialization

struct Random { Random::get():

int get() { movzx eax, BYTE PTR guard var
// threadsafe! test al, al
static int i = rand(); je .L13
return i; mov eax, DWORD PTR get()::i
} ret
}; .L13
// acquire and set the guard var
int main() { 5-10% overhead compared to
Random r; non-static access, even if binary is
single threaded
return r.get();
}

59
std::pow can be slow, really slow

std::pow is a transcendental function, meaning it goes into a second, slower

phase if the accuracy of the result isn’t acceptable after the first phase.

auto base = 1.00000000000001, exp1 = 1.4, exp2 = 1.5;

std::pow(base, exp1) = 1.0000000000000140
std::pow(base, exp2) = 1.0000000000000151

Benchmark Time Iterations

-------------------------------------------------------
pow(base, exp1) [glibc 2.17] 53 ns 13142054
pow(base, exp1) [glibc 2.21] 53 ns 13142821
pow(base, exp2) [glibc 2.17] 478195 ns 1457
pow(base, exp2) [glibc 2.21] 63348 ns 11113

60
Measurement of low latency systems

“Bottlenecks occur in surprising places, so don't try to second guess and put in
a speed hack until you've proven that's where the bottleneck is.”
– Rob Pike

61
Measurement of low latency systems

● Two common approaches:

○ Profiling: seeing what your code is doing (bottlenecks in particular)
○ Benchmarking: timing the speed of your system
● Caution: profiling is not necessarily benchmarking
○ Profiling is useful for catching unexpected things
○ Improvements in profiling results isn’t a 100% guarantee that your system
is now faster

62
✘ Sampling profilers (e.g. gprof) are not what you are looking for
○ They miss the key events
✘ Instrumentation profilers (e.g. valgrind) are not what you are looking for
○ They are too intrusive
○ They don’t catch I/O slowness/jitter (they don’t even model I/O)
✘ Microbenchmarks (e.g. google benchmark) are not what you are looking for
○ They are not representative of a realistic environment
○ Takes some effort to force the compiler to not optimize out the test
○ Heap fragmentation can have an impact on subsequent tests

They are all in some ways useful, but not for micro-optimization of code

63
？Performance counters can be useful (e.g. linux perf)
○ E.g. # of cache misses, # of pipeline stalls

？Consider just comparing certain types of instruction counts

○ objdump -S my_binary | cut -c 33-34 | grep j | wc -l

？High-resolution timestamping can be useful (e.g. the hardware TSC)

○ Doesn’t need to be in sync with clock time
■ Just needs to be constant across samples
○ If you want actual nanoseconds:
■ Calibrate with wallclock time every few milliseconds

64
✓ Most useful: measure end-to-end time in a production-like setup
(Many trading companies do this)

Switch with high precision hardware-based

timestamping (appended to each packet)

Server which replays Server under test - listens

exchange market data to market data and sends
and accepts orders orders

Server which captures and parses each

network packet it sees, and calculates
response time (accurate to a few
nanoseconds)

65
Summary

“A language that doesn't affect the way you think about programming is not
worth knowing.”
– Alan Perlis

66
● Know C++ well, including your compiler
● Know the basics of machine architecture, and how it will impact your code
● Do as much work as possible at compile time
● Aim for very simple runtime logic
● Accurate measurement is essential
● Assume nothing: a lot can be surprising, and compilers, hardware and
operating systems are always changing

67
Thanks for listening!

Ebin - Pub Building Low Latency Applications With C Develop A Complete Low Latency Trading Ecosystem From Scratch Using Modern C 1nbsped 1837639353 9781837639359
No ratings yet
Ebin - Pub Building Low Latency Applications With C Develop A Complete Low Latency Trading Ecosystem From Scratch Using Modern C 1nbsped 1837639353 9781837639359
508 pages
System Design Course Content - Gaurav Sen
No ratings yet
System Design Course Content - Gaurav Sen
14 pages
System Design Interview Fundamentals
100% (4)
System Design Interview Fundamentals
412 pages
Richard Fabian Data Oriented Design Software Engineering For Limited
No ratings yet
Richard Fabian Data Oriented Design Software Engineering For Limited
327 pages
When A Microsecond Is An Eternity - Carl Cook - CppCon 2017
No ratings yet
When A Microsecond Is An Eternity - Carl Cook - CppCon 2017
55 pages
PHD Thesis 2004 - Efficient, Transparent, and Comprehensive Runtime Code Manipulation PHD
No ratings yet
PHD Thesis 2004 - Efficient, Transparent, and Comprehensive Runtime Code Manipulation PHD
306 pages
High Quality Software Engineering
No ratings yet
High Quality Software Engineering
84 pages
Perfbook-1c 2019 12 22a PDF
No ratings yet
Perfbook-1c 2019 12 22a PDF
825 pages
OpenACC Programming Guide 0 0
No ratings yet
OpenACC Programming Guide 0 0
73 pages
The Parallel Book
No ratings yet
The Parallel Book
646 pages
Is Parallel Programming Hard - and - If So - What Can You Do About It
No ratings yet
Is Parallel Programming Hard - and - If So - What Can You Do About It
533 pages
Perfbook 1c E2 rc11
No ratings yet
Perfbook 1c E2 rc11
881 pages
Parallel Programming
No ratings yet
Parallel Programming
692 pages
High Quality Sofware Engineering
No ratings yet
High Quality Sofware Engineering
84 pages
Programming For Performance: Alosh Bennett
No ratings yet
Programming For Performance: Alosh Bennett
22 pages
Perfbook-Eb 2023 06 11a
No ratings yet
Perfbook-Eb 2023 06 11a
1,432 pages
Res PDF
No ratings yet
Res PDF
184 pages
Perfbook 2023 06 11a
No ratings yet
Perfbook 2023 06 11a
662 pages
Perfbook-1c 2023 06 11a
No ratings yet
Perfbook-1c 2023 06 11a
970 pages
SG 247497
No ratings yet
SG 247497
690 pages
Is Parallel Programming Hard, And, If So, What Can You Do About It V2021.12.22a
No ratings yet
Is Parallel Programming Hard, And, If So, What Can You Do About It V2021.12.22a
630 pages
Is Parallel Programming Hard
No ratings yet
Is Parallel Programming Hard
662 pages
Oops Notes Unit Wise
No ratings yet
Oops Notes Unit Wise
91 pages
High Performance Managed Languages: Martin Thompson - @mjpt777
No ratings yet
High Performance Managed Languages: Martin Thompson - @mjpt777
107 pages
CUDA C Best Practices Guide
No ratings yet
CUDA C Best Practices Guide
116 pages
Perfbook-1c 2021 12 22a
No ratings yet
Perfbook-1c 2021 12 22a
930 pages
Is Parallel Programming Hard, And, If So, What Can You Do
No ratings yet
Is Parallel Programming Hard, And, If So, What Can You Do
475 pages
Perfbook-1c 2022 09 25a
No ratings yet
Perfbook-1c 2022 09 25a
950 pages
Software Architecture Design ( PDFDrive )
No ratings yet
Software Architecture Design ( PDFDrive )
169 pages
high-frequency-trading-final-pres-slides
No ratings yet
high-frequency-trading-final-pres-slides
43 pages
Computer Science Design Patterns
No ratings yet
Computer Science Design Patterns
277 pages
Page Replacement in Operating System Memory Management: Heikki Paajanen
No ratings yet
Page Replacement in Operating System Memory Management: Heikki Paajanen
109 pages
Google: Designs, Lessons and Advice From Building Large Distributed Systems
100% (3)
Google: Designs, Lessons and Advice From Building Large Distributed Systems
73 pages
CUDA C++ Best Practices Guide
No ratings yet
CUDA C++ Best Practices Guide
118 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
217 Lec1
No ratings yet
217 Lec1
35 pages
Cs8083 Unit II Notes
No ratings yet
Cs8083 Unit II Notes
23 pages
OOSE
No ratings yet
OOSE
44 pages
SE 350: Operating Systems
No ratings yet
SE 350: Operating Systems
24 pages
Concurrency Analysis Report
No ratings yet
Concurrency Analysis Report
42 pages
Is Parallel Programming Hard And If So What Can You Do About It Paul E. Mckenney - Quickly access the ebook and start reading today
100% (5)
Is Parallel Programming Hard And If So What Can You Do About It Paul E. Mckenney - Quickly access the ebook and start reading today
66 pages
Pet Shop Software Presentation
No ratings yet
Pet Shop Software Presentation
9 pages
Essential Data Structures for C++ Programmers
No ratings yet
Essential Data Structures for C++ Programmers
105 pages
Qfin CPP
No ratings yet
Qfin CPP
67 pages
Data Structures Big Wiiki Books Bigger Book
No ratings yet
Data Structures Big Wiiki Books Bigger Book
440 pages
Parallel Computing
100% (1)
Parallel Computing
241 pages
Clean Code V2
No ratings yet
Clean Code V2
672 pages
C NPv1
No ratings yet
C NPv1
226 pages
9781837639359_ColorImages
No ratings yet
9781837639359_ColorImages
27 pages
Systems Design Interview Study Guide
100% (1)
Systems Design Interview Study Guide
18 pages
SS ZG653 (RL 9.3) : Software Architecture: Layering Pattern
No ratings yet
SS ZG653 (RL 9.3) : Software Architecture: Layering Pattern
23 pages
(Ebooks PDF) Download Is Parallel Programming Hard and If So What Can You Do About It Paul E. Mckenney Full Chapters
100% (3)
(Ebooks PDF) Download Is Parallel Programming Hard and If So What Can You Do About It Paul E. Mckenney Full Chapters
46 pages
DPDK Programmers Guide
No ratings yet
DPDK Programmers Guide
205 pages
Is Parallel Programming Hard And If So What Can You Do About It Paul E. Mckenney download pdf
100% (2)
Is Parallel Programming Hard And If So What Can You Do About It Paul E. Mckenney download pdf
54 pages
Amdahl's Law: S (N) T (1) /T (N)
No ratings yet
Amdahl's Law: S (N) T (1) /T (N)
46 pages