100% found this document useful (3 votes)
12 views

Pro Tbb: C++ Parallel Programming with Threading Building Blocks 1st Edition Michael Voss pdf download

The document is about 'Pro TBB: C++ Parallel Programming with Threading Building Blocks' by Michael Voss, which covers the use of Intel's Threading Building Blocks (TBB) for parallel programming in C++. It includes various chapters on parallel algorithms, flow graphs, synchronization, and data structures for concurrency. The book is designed to help developers leverage TBB for efficient parallel programming in C++ applications.

Uploaded by

bettsvitti5q
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
12 views

Pro Tbb: C++ Parallel Programming with Threading Building Blocks 1st Edition Michael Voss pdf download

The document is about 'Pro TBB: C++ Parallel Programming with Threading Building Blocks' by Michael Voss, which covers the use of Intel's Threading Building Blocks (TBB) for parallel programming in C++. It includes various chapters on parallel algorithms, flow graphs, synchronization, and data structures for concurrency. The book is designed to help developers leverage TBB for efficient parallel programming in C++ applications.

Uploaded by

bettsvitti5q
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Pro Tbb: C++ Parallel Programming with Threading

Building Blocks 1st Edition Michael Voss pdf


download

https://textbookfull.com/product/pro-tbb-c-parallel-programming-
with-threading-building-blocks-1st-edition-michael-voss/

Download more ebook from https://textbookfull.com


We believe these products will be a great fit for you. Click
the link to download now, or visit textbookfull.com
to discover even more!

Primary Mathematics Textbook 2B Jennifer Hoerst

https://textbookfull.com/product/primary-mathematics-
textbook-2b-jennifer-hoerst/

Handbook of Macroeconomics, Volume 2A-2B SET 1st


Edition John B. Taylor

https://textbookfull.com/product/handbook-of-macroeconomics-
volume-2a-2b-set-1st-edition-john-b-taylor/

Fortran 2018 with Parallel Programming 1st Edition


Subrata Ray (Author)

https://textbookfull.com/product/fortran-2018-with-parallel-
programming-1st-edition-subrata-ray-author/

Parallel Programming with Co-Arrays Robert W. Numrich

https://textbookfull.com/product/parallel-programming-with-co-
arrays-robert-w-numrich/
Concurrency in NET Modern patterns of concurrent and
parallel programming With examples in C and F 1st
Edition Riccardo Terrell

https://textbookfull.com/product/concurrency-in-net-modern-
patterns-of-concurrent-and-parallel-programming-with-examples-in-
c-and-f-1st-edition-riccardo-terrell/

Semirings as Building Blocks in Cryptography 1st


Edition Mariana Durcheva

https://textbookfull.com/product/semirings-as-building-blocks-in-
cryptography-1st-edition-mariana-durcheva/

Pro C# 8 with .NET Core 3: Foundational Principles and


Practices in Programming - Ninth Edition Andrew
Troelsen

https://textbookfull.com/product/pro-c-8-with-net-
core-3-foundational-principles-and-practices-in-programming-
ninth-edition-andrew-troelsen/

Data Parallel C++ Mastering DPC++ for Programming of


Heterogeneous Systems using C++ and SYCL 1st Edition
James Reinders

https://textbookfull.com/product/data-parallel-c-mastering-dpc-
for-programming-of-heterogeneous-systems-using-c-and-sycl-1st-
edition-james-reinders/

Concurrency in C Cookbook Asynchronous Parallel and


Multithreaded Programming 2nd Edition Stephen Cleary

https://textbookfull.com/product/concurrency-in-c-cookbook-
asynchronous-parallel-and-multithreaded-programming-2nd-edition-
stephen-cleary/
Pro TBB
C++ Parallel Programming with
Threading Building Blocks

Michael Voss
Rafael Asenjo
James Reinders
Pro TBB
C++ Parallel Programming with
Threading Building Blocks

Michael Voss
Rafael Asenjo
James Reinders
Pro TBB: C++ Parallel Programming with Threading Building Blocks
Michael Voss Rafael Asenjo
Austin, Texas, USA Málaga, Spain
James Reinders
Portland, Oregon, USA

ISBN-13 (pbk): 978-1-4842-4397-8 ISBN-13 (electronic): 978-1-4842-4398-5


https://doi.org/10.1007/978-1-4842-4398-5

Copyright © 2019 by Intel Corporation


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on
microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed.
Open Access This book is licensed under the terms of the Creative Commons Attribution-
NonCommercial-NoDerivatives 4.0 International License (http://creativecommons.org/licenses/
by-nc-nd/4.0/), which permits any noncommercial use, sharing, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license and indicate if you modified the licensed material. You do not have permission under this
license to share adapted material derived from this book or parts of it.
The images or other third party material in this book are included in the book’s Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the book’s Creative Commons license and
your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain
permission directly from the copyright holder.
Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every
occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to
the benefit of the trademark owner, with no intention of infringement of the trademark.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as
such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
While the advice and information in this book are believed to be true and accurate at the date of publication, neither the
authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made.
The publisher makes no warranty, express or implied, with respect to the material contained herein.
Managing Director, Apress Media LLC: Welmoed Spahr
Acquisitions Editor: Natalie Pao
Development Editor: James Markham
Coordinating Editor: Jessica Vakili
Cover designed by eStudioCalamar
Cover image designed by Freepik (www.freepik.com)
Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor,
New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit
www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science +
Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.
For information on translations, please e-mail [email protected], or visit http://www.apress.com/rights-permissions.
Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also
available for most titles. For more information, reference our Print and eBook Bulk Sales web page at http://www.apress.
com/bulk-sales.
Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub
via the book’s product page, located at www.apress.com/978-1-4842-4397-8. For more detailed information, please visit
http://www.apress.com/source-code.
Printed on acid-free paper.
Table of Contents
About the Authors����������������������������������������������������������������������������������������������������xv

Acknowledgments�������������������������������������������������������������������������������������������������xvii

Preface�������������������������������������������������������������������������������������������������������������������xix

Part 1�������������������������������������������������������������������������������������������������������������� 1
Chapter 1: Jumping Right In: “Hello, TBB!”�������������������������������������������������������������� 3
Why Threading Building Blocks?��������������������������������������������������������������������������������������������������� 3
Performance: Small Overhead, Big Benefits for C++�������������������������������������������������������������� 4
Evolving Support for Parallelism in TBB and C++������������������������������������������������������������������� 5
Recent C++ Additions for Parallelism������������������������������������������������������������������������������������� 6
The Threading Building Blocks (TBB) Library�������������������������������������������������������������������������������� 7
Parallel Execution Interfaces��������������������������������������������������������������������������������������������������� 8
Interfaces That Are Independent of the Execution Model������������������������������������������������������ 10
Using the Building Blocks in TBB������������������������������������������������������������������������������������������� 10
Let’s Get Started Already!����������������������������������������������������������������������������������������������������������� 11
Getting the Threading Building Blocks (TBB) Library������������������������������������������������������������� 11
Getting a Copy of the Examples��������������������������������������������������������������������������������������������� 12
Writing a First “Hello, TBB!” Example������������������������������������������������������������������������������������ 12
Building the Simple Examples����������������������������������������������������������������������������������������������� 15
Building on Windows Using Microsoft Visual Studio������������������������������������������������������������� 16
Building on a Linux Platform from a Terminal����������������������������������������������������������������������� 17
A More Complete Example���������������������������������������������������������������������������������������������������������� 21
Starting with a Serial Implementation����������������������������������������������������������������������������������� 21
Adding a Message-Driven Layer Using a Flow Graph������������������������������������������������������������ 25
Adding a Fork-Join Layer Using a parallel_for���������������������������������������������������������������������� 27
Adding a SIMD Layer Using a Parallel STL Transform����������������������������������������������������������� 29

iii
Table of Contents

Chapter 2: Generic Parallel Algorithms������������������������������������������������������������������ 33


Functional / Task Parallelism������������������������������������������������������������������������������������������������������ 37
A Slightly More Complicated Example: A Parallel Implementation of Quicksort�������������������� 40
Loops: parallel_for, parallel_reduce, and parallel_scan�������������������������������������������������������������� 42
parallel_for: Applying a Body to Each Element in a Range���������������������������������������������������� 42
parallel_reduce: Calculating a Single Result Across a Range����������������������������������������������� 46
parallel_scan: A Reduction with Intermediate Values����������������������������������������������������������� 52
How Does This Work?������������������������������������������������������������������������������������������������������������ 54
A Slightly More Complicated Example: Line of Sight������������������������������������������������������������� 56
Cook Until Done: parallel_do and parallel_pipeline�������������������������������������������������������������������� 57
parallel_do: Apply a Body Until There Are No More Items Left���������������������������������������������� 58
parallel_pipeline: Streaming Items Through a Series of Filters��������������������������������������������� 67

Chapter 3: Flow Graphs������������������������������������������������������������������������������������������ 79


Why Use Graphs to Express Parallelism?������������������������������������������������������������������������������������ 80
The Basics of the TBB Flow Graph Interface������������������������������������������������������������������������������� 82
Step 1: Create the Graph Object�������������������������������������������������������������������������������������������� 84
Step 2: Make the Nodes�������������������������������������������������������������������������������������������������������� 84
Step 3: Add Edges������������������������������������������������������������������������������������������������������������������ 87
Step 4: Start the Graph���������������������������������������������������������������������������������������������������������� 89
Step 5: Wait for the Graph to Complete Executing����������������������������������������������������������������� 91
A More Complicated Example of a Data Flow Graph������������������������������������������������������������������� 91
Implementing the Example as a TBB Flow Graph������������������������������������������������������������������ 93
Understanding the Performance of a Data Flow Graph��������������������������������������������������������� 96
The Special Case of Dependency Graphs������������������������������������������������������������������������������������ 97
Implementing a Dependency Graph�������������������������������������������������������������������������������������� 99
Estimating the Scalability of a Dependency Graph�������������������������������������������������������������� 105
Advanced Topics in TBB Flow Graphs��������������������������������������������������������������������������������������� 106

Chapter 4: TBB and the Parallel Algorithms of the C++ Standard


Template Library��������������������������������������������������������������������������������������������������� 109
Does the C++ STL Library Belong in This Book?���������������������������������������������������������������������� 110
A Parallel STL Execution Policy Analogy����������������������������������������������������������������������������������� 112
iv
Table of Contents

A Simple Example Using std::for_each������������������������������������������������������������������������������������� 113


What Algorithms Are Provided in a Parallel STL Implementation?�������������������������������������������� 117
How to Get and Use a Copy of Parallel STL That Uses TBB������������������������������������������������� 117
Algorithms in Intel’s Parallel STL����������������������������������������������������������������������������������������� 118
Capturing More Use Cases with Custom Iterators�������������������������������������������������������������������� 120
Highlighting Some of the Most Useful Algorithms�������������������������������������������������������������������� 124
std::for_each, std::for_each_n�������������������������������������������������������������������������������������������� 124
std::transform���������������������������������������������������������������������������������������������������������������������� 126
std::reduce�������������������������������������������������������������������������������������������������������������������������� 127
std::transform_reduce��������������������������������������������������������������������������������������������������������� 128
A Deeper Dive into the Execution Policies�������������������������������������������������������������������������������� 130
The sequenced_policy��������������������������������������������������������������������������������������������������������� 131
The parallel_policy�������������������������������������������������������������������������������������������������������������� 131
The unsequenced_policy����������������������������������������������������������������������������������������������������� 132
The parallel_unsequenced_policy��������������������������������������������������������������������������������������� 132
Which Execution Policy Should We Use?���������������������������������������������������������������������������������� 132
Other Ways to Introduce SIMD Parallelism�������������������������������������������������������������������������������� 134

Chapter 5: Synchronization: Why and How to Avoid It����������������������������������������� 137


A Running Example: Histogram of an Image����������������������������������������������������������������������������� 138
An Unsafe Parallel Implementation������������������������������������������������������������������������������������������� 141
A First Safe Parallel Implementation: Coarse-­Grained Locking������������������������������������������������ 145
Mutex Flavors���������������������������������������������������������������������������������������������������������������������� 151
A Second Safe Parallel Implementation: Fine-­Grained Locking������������������������������������������������ 153
A Third Safe Parallel Implementation: Atomics������������������������������������������������������������������������� 158
A Better Parallel Implementation: Privatization and Reduction������������������������������������������������� 163
Thread Local Storage, TLS��������������������������������������������������������������������������������������������������� 164
enumerable_thread_specific, ETS��������������������������������������������������������������������������������������� 165
combinable�������������������������������������������������������������������������������������������������������������������������� 168
The Easiest Parallel Implementation: Reduction Template������������������������������������������������������� 170
Recap of Our Options���������������������������������������������������������������������������������������������������������������� 172

v
Table of Contents

Chapter 6: Data Structures for Concurrency�������������������������������������������������������� 179


Key Data Structures Basics������������������������������������������������������������������������������������������������������� 180
Unordered Associative Containers��������������������������������������������������������������������������������������� 180
Map vs. Set�������������������������������������������������������������������������������������������������������������������������� 181
Multiple Values�������������������������������������������������������������������������������������������������������������������� 181
Hashing������������������������������������������������������������������������������������������������������������������������������� 181
Unordered���������������������������������������������������������������������������������������������������������������������������� 182
Concurrent Containers�������������������������������������������������������������������������������������������������������������� 182
Concurrent Unordered Associative Containers�������������������������������������������������������������������� 185
Concurrent Queues: Regular, Bounded, and Priority������������������������������������������������������������ 193
Concurrent Vector���������������������������������������������������������������������������������������������������������������� 202

Chapter 7: Scalable Memory Allocation��������������������������������������������������������������� 207


Modern C++ Memory Allocation����������������������������������������������������������������������������������������������� 208
Scalable Memory Allocation: What�������������������������������������������������������������������������������������������� 209
Scalable Memory Allocation: Why��������������������������������������������������������������������������������������������� 209
Avoiding False Sharing with Padding���������������������������������������������������������������������������������� 210
Scalable Memory Allocation Alternatives: Which���������������������������������������������������������������������� 212
Compilation Considerations������������������������������������������������������������������������������������������������������ 214
Most Popular Usage (C/C++ Proxy Library): How��������������������������������������������������������������������� 214
Linux: malloc/new Proxy Library Usage������������������������������������������������������������������������������ 216
macOS: malloc/new Proxy Library Usage���������������������������������������������������������������������������� 216
Windows: malloc/new Proxy Library Usage������������������������������������������������������������������������ 217
Testing our Proxy Library Usage������������������������������������������������������������������������������������������ 218
C Functions: Scalable Memory Allocators for C������������������������������������������������������������������������ 220
C++ Classes: Scalable Memory Allocators for C++������������������������������������������������������������������ 221
Allocators with std::allocator<T> Signature����������������������������������������������������������������������� 222
scalable_allocator��������������������������������������������������������������������������������������������������������������������� 222
tbb_allocator����������������������������������������������������������������������������������������������������������������������������� 222
zero_allocator��������������������������������������������������������������������������������������������������������������������������� 223

vi
Table of Contents

cached_aligned_allocator��������������������������������������������������������������������������������������������������������� 223
Memory Pool Support: memory_pool_allocator������������������������������������������������������������������ 223
Array Allocation Support: aligned_space����������������������������������������������������������������������������� 224
Replacing new and delete Selectively�������������������������������������������������������������������������������������� 224
Performance Tuning: Some Control Knobs�������������������������������������������������������������������������������� 228
What Are Huge Pages?�������������������������������������������������������������������������������������������������������� 228
TBB Support for Huge Pages����������������������������������������������������������������������������������������������� 228
scalable_allocation_mode(int mode, intptr_t value)����������������������������������������������������������� 229
TBBMALLOC_USE_HUGE_PAGES����������������������������������������������������������������������������������������� 229
TBBMALLOC_SET_SOFT_HEAP_LIMIT��������������������������������������������������������������������������������� 230
int scalable_allocation_command(int cmd, void *param)��������������������������������������������������� 230
TBBMALLOC_CLEAN_ALL_BUFFERS����������������������������������������������������������������������������������� 230
TBBMALLOC_CLEAN_THREAD_BUFFERS���������������������������������������������������������������������������� 230

Chapter 8: Mapping Parallel Patterns to TBB������������������������������������������������������� 233


Parallel Patterns vs. Parallel Algorithms����������������������������������������������������������������������������������� 233
Patterns Categorize Algorithms, Designs, etc.�������������������������������������������������������������������������� 235
Patterns That Work�������������������������������������������������������������������������������������������������������������������� 236
Data Parallelism Wins��������������������������������������������������������������������������������������������������������������� 237
Nesting Pattern������������������������������������������������������������������������������������������������������������������������� 238
Map Pattern������������������������������������������������������������������������������������������������������������������������������ 239
Workpile Pattern����������������������������������������������������������������������������������������������������������������������� 240
Reduction Patterns (Reduce and Scan)������������������������������������������������������������������������������������ 241
Fork-Join Pattern���������������������������������������������������������������������������������������������������������������������� 243
Divide-and-Conquer Pattern����������������������������������������������������������������������������������������������������� 244
Branch-and-Bound Pattern������������������������������������������������������������������������������������������������������� 244
Pipeline Pattern������������������������������������������������������������������������������������������������������������������������� 246
Event-Based Coordination Pattern (Reactive Streams)������������������������������������������������������������� 247

vii
Table of Contents

Part 2�������������������������������������������������������������������������������������������������������������������� 249

Chapter 9: The Pillars of Composability��������������������������������������������������������������� 251


What Is Composability?������������������������������������������������������������������������������������������������������������� 253
Nested Composition������������������������������������������������������������������������������������������������������������ 254
Concurrent Composition������������������������������������������������������������������������������������������������������ 256
Serial Composition�������������������������������������������������������������������������������������������������������������� 258
The Features That Make TBB a Composable Library���������������������������������������������������������������� 259
The TBB Thread Pool (the Market) and Task Arenas������������������������������������������������������������ 260
The TBB Task Dispatcher: Work Stealing and More������������������������������������������������������������� 263
Putting It All Together���������������������������������������������������������������������������������������������������������������� 270
Looking Forward����������������������������������������������������������������������������������������������������������������������� 274
Controlling the Number of Threads�������������������������������������������������������������������������������������� 274
Work Isolation���������������������������������������������������������������������������������������������������������������������� 274
Task-to-Thread and Thread-to-Core Affinity������������������������������������������������������������������������ 275
Task Priorities���������������������������������������������������������������������������������������������������������������������� 275

Chapter 10: Using Tasks to Create Your Own Algorithms������������������������������������� 277


A Running Example: The Sequence������������������������������������������������������������������������������������������� 278
The High-Level Approach: parallel_invoke�������������������������������������������������������������������������������� 280
The Highest Among the Lower: task_group������������������������������������������������������������������������������ 282
The Low-Level Task Interface: Part One – Task Blocking���������������������������������������������������������� 284
The Low-Level Task Interface: Part Two – Task Continuation��������������������������������������������������� 290
Bypassing the Scheduler����������������������������������������������������������������������������������������������������� 297
The Low-Level Task Interface: Part Three – Task Recycling����������������������������������������������������� 297
Task Interface Checklist������������������������������������������������������������������������������������������������������������ 300
One More Thing: FIFO (aka Fire-and-Forget) Tasks������������������������������������������������������������������� 301
Putting These Low-Level Features to Work������������������������������������������������������������������������������� 302

Chapter 11: Controlling the Number of Threads Used for Execution�������������������� 313
A Brief Recap of the TBB Scheduler Architecture��������������������������������������������������������������������� 314
Interfaces for Controlling the Number of Threads��������������������������������������������������������������������� 315

viii
Table of Contents

Controlling Thread Count with task_scheduler_init������������������������������������������������������������ 315


Controlling Thread Count with task_arena�������������������������������������������������������������������������� 316
Controlling Thread Count with global_control��������������������������������������������������������������������� 318
Summary of Concepts and Classes������������������������������������������������������������������������������������� 318
The Best Approaches for Setting the Number of Threads��������������������������������������������������������� 320
Using a Single task_scheduler_init Object for a Simple Application����������������������������������� 320
Using More Than One task_scheduler_init Object in a Simple Application������������������������� 323
Using Multiple Arenas with Different Numbers of Slots to Influence Where
TBB Places Its Worker Threads�������������������������������������������������������������������������������������������� 325
Using global_control to Control How Many Threads Are Available to Fill Arena Slots��������� 329
Using global_control to Temporarily Restrict the Number of Available Threads������������������ 330
When NOT to Control the Number of Threads��������������������������������������������������������������������������� 332
Figuring Out What’s Gone Wrong���������������������������������������������������������������������������������������������� 334

Chapter 12: Using Work Isolation for Correctness and Performance������������������� 337
Work Isolation for Correctness�������������������������������������������������������������������������������������������������� 338
Creating an Isolated Region with this_task_arena::isolate������������������������������������������������� 343
Using Task Arenas for Isolation: A Double-Edged Sword���������������������������������������������������������� 349
Don’t Be Tempted to Use task_arenas to Create Work Isolation for Correctness���������������� 353

Chapter 13: Creating Thread-to-Core and Task-to-Thread Affinity����������������������� 357


Creating Thread-to-Core Affinity����������������������������������������������������������������������������������������������� 358
Creating Task-to-Thread Affinity����������������������������������������������������������������������������������������������� 362
When and How Should We Use the TBB Affinity Features?������������������������������������������������������� 370

Chapter 14: Using Task Priorities������������������������������������������������������������������������� 373


Support for Non-Preemptive Priorities in the TBB Task Class��������������������������������������������������� 374
Setting Static and Dynamic Priorities��������������������������������������������������������������������������������������� 376
Two Small Examples����������������������������������������������������������������������������������������������������������������� 377
Implementing Priorities Without Using TBB Task Support��������������������������������������������������������� 382

Chapter 15: Cancellation and Exception Handling������������������������������������������������ 387


How to Cancel Collective Work������������������������������������������������������������������������������������������������� 388
Advanced Task Cancellation������������������������������������������������������������������������������������������������������ 390

ix
Table of Contents

Explicit Assignment of TGC�������������������������������������������������������������������������������������������������� 392


Default Assignment of TGC�������������������������������������������������������������������������������������������������� 395
Exception Handling in TBB�������������������������������������������������������������������������������������������������������� 399
Tailoring Our Own TBB Exceptions�������������������������������������������������������������������������������������������� 402
Putting All Together: Composability, Cancellation, and Exception Handling������������������������������ 405

Chapter 16: Tuning TBB Algorithms: Granularity, Locality, Parallelism,


and Determinism�������������������������������������������������������������������������������������������������� 411
Task Granularity: How Big Is Big Enough?�������������������������������������������������������������������������������� 412
Choosing Ranges and Partitioners for Loops���������������������������������������������������������������������������� 413
An Overview of Partitioners������������������������������������������������������������������������������������������������� 415
Choosing a Grainsize (or Not) to Manage Task Granularity�������������������������������������������������� 417
Ranges, Partitioners, and Data Cache Performance������������������������������������������������������������ 420
Using a static_partitioner���������������������������������������������������������������������������������������������������� 428
Restricting the Scheduler for Determinism������������������������������������������������������������������������� 431
Tuning TBB Pipelines: Number of Filters, Modes, and Tokens��������������������������������������������������� 433
Understanding a Balanced Pipeline������������������������������������������������������������������������������������� 434
Understanding an Imbalanced Pipeline������������������������������������������������������������������������������� 436
Pipelines and Data Locality and Thread Affinity������������������������������������������������������������������ 438
Deep in the Weeds�������������������������������������������������������������������������������������������������������������������� 439
Making Your Own Range Type��������������������������������������������������������������������������������������������� 439
The Pipeline Class and Thread-Bound Filters���������������������������������������������������������������������� 442

Chapter 17: Flow Graphs: Beyond the Basics������������������������������������������������������� 451


Optimizing for Granularity, Locality, and Parallelism����������������������������������������������������������������� 452
Node Granularity: How Big Is Big Enough?�������������������������������������������������������������������������� 452
Memory Usage and Data Locality���������������������������������������������������������������������������������������� 462
Task Arenas and Flow Graph����������������������������������������������������������������������������������������������� 477
Key FG Advice: Dos and Don’ts������������������������������������������������������������������������������������������������� 480
Do: Use Nested Parallelism������������������������������������������������������������������������������������������������� 480
Don’t: Use Multifunction Nodes in Place of Nested Parallelism������������������������������������������ 481
Do: Use join_node, sequencer_node, or multifunction_node to Reestablish Order
in a Flow Graph When Needed�������������������������������������������������������������������������������������������� 481

x
Table of Contents

Do: Use the Isolate Function for Nested Parallelism������������������������������������������������������������ 485


Do: Use Cancellation and Exception Handling in Flow Graphs�������������������������������������������� 488
Do: Set a Priority for a Graph Using task_group_context���������������������������������������������������� 492
Don’t: Make an Edge Between Nodes in Different Graphs�������������������������������������������������� 492
Do: Use try_put to Communicate Across Graphs����������������������������������������������������������������� 495
Do: Use composite_node to Encapsulate Groups of Nodes������������������������������������������������� 497
Introducing Intel Advisor: Flow Graph Analyzer������������������������������������������������������������������������� 501
The FGA Design Workflow��������������������������������������������������������������������������������������������������� 502
The FGA Analysis Workflow������������������������������������������������������������������������������������������������� 505
Diagnosing Performance Issues with FGA��������������������������������������������������������������������������� 507

Chapter 18: Beef Up Flow Graphs with Async Nodes������������������������������������������� 513


Async World Example���������������������������������������������������������������������������������������������������������������� 514
Why and When async_node?���������������������������������������������������������������������������������������������������� 519
A More Realistic Example��������������������������������������������������������������������������������������������������������� 521

Chapter 19: Flow Graphs on Steroids: OpenCL Nodes������������������������������������������ 535


Hello OpenCL_Node Example���������������������������������������������������������������������������������������������������� 536
Where Are We Running Our Kernel?������������������������������������������������������������������������������������������ 544
Back to the More Realistic Example of Chapter 18������������������������������������������������������������������� 551
The Devil Is in the Details��������������������������������������������������������������������������������������������������������� 561
The NDRange Concept��������������������������������������������������������������������������������������������������������� 562
Playing with the Offset�������������������������������������������������������������������������������������������������������� 568
Specifying the OpenCL Kernel��������������������������������������������������������������������������������������������� 569
Even More on Device Selection������������������������������������������������������������������������������������������������� 570
A Warning Regarding the Order Is in Order!������������������������������������������������������������������������������ 574

Chapter 20: TBB on NUMA Architectures�������������������������������������������������������������� 581


Discovering Your Platform Topology������������������������������������������������������������������������������������������ 583
Understanding the Costs of Accessing Memory������������������������������������������������������������������ 587
Our Baseline Example��������������������������������������������������������������������������������������������������������� 588
Mastering Data Placement and Processor Affinity�������������������������������������������������������������� 589

xi
Table of Contents

Putting hwloc and TBB to Work Together���������������������������������������������������������������������������������� 595


More Advanced Alternatives����������������������������������������������������������������������������������������������������� 601

Appendix A: History and Inspiration��������������������������������������������������������������������� 605


A Decade of “Hatchling to Soaring”������������������������������������������������������������������������������������������ 605
#1 TBB’s Revolution Inside Intel������������������������������������������������������������������������������������������ 605
#2 TBB’s First Revolution of Parallelism������������������������������������������������������������������������������ 606
#3 TBB’s Second Revolution of Parallelism������������������������������������������������������������������������� 607
#4 TBB’s Birds��������������������������������������������������������������������������������������������������������������������� 608
Inspiration for TBB�������������������������������������������������������������������������������������������������������������������� 611
Relaxed Sequential Execution Model���������������������������������������������������������������������������������� 612
Influential Libraries�������������������������������������������������������������������������������������������������������������� 613
Influential Languages���������������������������������������������������������������������������������������������������������� 614
Influential Pragmas������������������������������������������������������������������������������������������������������������� 615
Influences of Generic Programming������������������������������������������������������������������������������������ 615
Considering Caches������������������������������������������������������������������������������������������������������������� 616
Considering Costs of Time Slicing��������������������������������������������������������������������������������������� 617
Further Reading������������������������������������������������������������������������������������������������������������������� 618

Appendix B: TBB Précis���������������������������������������������������������������������������������������� 623


Debug and Conditional Coding�������������������������������������������������������������������������������������������������� 624
Preview Feature Macros����������������������������������������������������������������������������������������������������������� 626
Ranges�������������������������������������������������������������������������������������������������������������������������������������� 626
Partitioners������������������������������������������������������������������������������������������������������������������������������� 627
Algorithms��������������������������������������������������������������������������������������������������������������������������������� 628
Algorithm: parallel_do��������������������������������������������������������������������������������������������������������������� 629
Algorithm: parallel_for�������������������������������������������������������������������������������������������������������������� 631
Algorithm: parallel_for_each���������������������������������������������������������������������������������������������������� 635
Algorithm: parallel_invoke�������������������������������������������������������������������������������������������������������� 636
Algorithm: parallel_pipeline������������������������������������������������������������������������������������������������������ 638
Algorithm: parallel_reduce and parallel_deterministic_reduce������������������������������������������������ 641
Algorithm: parallel_scan����������������������������������������������������������������������������������������������������������� 645

xii
Table of Contents

Algorithm: parallel_sort������������������������������������������������������������������������������������������������������������ 648


Algorithm: pipeline�������������������������������������������������������������������������������������������������������������������� 651
Flow Graph�������������������������������������������������������������������������������������������������������������������������������� 653
Flow Graph: graph class����������������������������������������������������������������������������������������������������������� 654
Flow Graph: ports and edges���������������������������������������������������������������������������������������������������� 655
Flow Graph: nodes�������������������������������������������������������������������������������������������������������������������� 655
Memory Allocation�������������������������������������������������������������������������������������������������������������������� 667
Containers��������������������������������������������������������������������������������������������������������������������������������� 673
Synchronization������������������������������������������������������������������������������������������������������������������������ 693
Thread Local Storage (TLS)������������������������������������������������������������������������������������������������������� 699
Timing��������������������������������������������������������������������������������������������������������������������������������������� 708
Task Groups: Use of the Task Stealing Scheduler��������������������������������������������������������������������� 709
Task Scheduler: Fine Control of the Task Stealing Scheduler��������������������������������������������������� 710
Floating-Point Settings������������������������������������������������������������������������������������������������������������� 721
Exceptions��������������������������������������������������������������������������������������������������������������������������������� 723
Threads������������������������������������������������������������������������������������������������������������������������������������� 725
Parallel STL������������������������������������������������������������������������������������������������������������������������������� 726

Glossary���������������������������������������������������������������������������������������������������������������� 729

Index��������������������������������������������������������������������������������������������������������������������� 745

xiii
About the Authors
Michael Voss is a Principal Engineer in the Intel Architecture, Graphics and Software
Group at Intel. He has been a member of the TBB development team since before the
1.0 release in 2006 and was the initial architect of the TBB flow graph API. He is also
one of the lead developers of Flow Graph Analyzer, a graphical tool for analyzing data
flow applications targeted at both homogeneous and heterogeneous platforms. He
has co-authored over 40 published papers and articles on topics related to parallel
programming and frequently consults with customers across a wide range of domains to
help them effectively use the threading libraries provided by Intel. Prior to joining Intel
in 2006, he was an Assistant Professor in the Edward S. Rogers Department of Electrical
and Computer Engineering at the University of Toronto. He received his Ph.D. from the
School of Electrical and Computer Engineering at Purdue University in 2001.

Rafael Asenjo, Professor of Computer Architecture at the University of Malaga, Spain,


obtained a PhD in Telecommunication Engineering in 1997 and was an Associate
Professor at the Computer Architecture Department from 2001 to 2017. He was a
Visiting Scholar at the University of Illinois in Urbana-Champaign (UIUC) in 1996 and
1997 and Visiting Research Associate in the same University in 1998. He was also a
Research Visitor at IBM T.J. Watson in 2008 and at Cray Inc. in 2011. He has been using
TBB since 2008 and over the last five years, he has focused on productively exploiting
heterogeneous chips leveraging TBB as the orchestrating framework. In 2013 and 2014
he visited UIUC to work on CPU+GPU chips. In 2015 and 2016 he also started to research
into CPU+FPGA chips while visiting U. of Bristol. He served as General Chair for ACM
PPoPP’16 and as an Organization Committee member as well as a Program Committee
member for several HPC related conferences (PPoPP, SC, PACT, IPDPS, HPCA, EuroPar,
and SBAC-PAD). His research interests include heterogeneous programming models
and architectures, parallelization of irregular codes and energy consumption.

James Reinders is a consultant with more than three decades experience in Parallel
Computing, and is an author/co-author/editor of nine technical books related to parallel
programming. He has had the great fortune to help make key contributions to two of
the world’s fastest computers (#1 on Top500 list) as well as many other supercomputers,

xv
About the Authors

and software developer tools. James finished 10,001 days (over 27 years) at Intel in mid-
2016, and now continues to write, teach, program, and do consulting in areas related to
parallel computing (HPC and AI).

xvi
Acknowledgments
Two people offered their early and continuing support for this project – Sanjiv Shah and
Herb Hinstorff. We are grateful for their encouragement, support, and occasional gentle
pushes.
The real heroes are reviewers who invested heavily in providing thoughtful and
detailed feedback on draft copies of the chapters within this book. The high quality
of their input helped drive us to allow more time for review and adjustment than we
initially planned. The book is far better as a result.
The reviewers are a stellar collection of users of TBB and key developers of TBB. It
is rare for a book project to have such an energized and supportive base of help in
refining a book. Anyone reading this book can know it is better because of these kind
souls: Eduard Ayguade, Cristina Beldica, Konstantin Boyarinov, José Carlos Cabaleiro
Domínguez, Brad Chamberlain, James Jen-Chang Chen, Jim Cownie, Sergey Didenko,
Alejandro (Alex) Duran, Mikhail Dvorskiy, Rudolf (Rudi) Eigenmann, George Elkoura,
Andrey Fedorov, Aleksei Fedotov, Tomás Fernández Pena, Elvis Fefey, Evgeny Fiksman,
Basilio Fraguela, Henry Gabb, José Daniel García Sánchez, Maria Jesus Garzaran,
Alexander Gerveshi, Darío Suárez Gracia, Kristina Kermanshahche, Yaniv Klein, Mark
Lubin, Anton Malakhov, Mark McLaughlin, Susan Meredith, Yeser Meziani, David
Padua, Nikita Ponomarev, Anoop Madhusoodhanan Prabha, Pablo Reble, Arch Robison,
Timmie Smith, Rubén Gran Tejero, Vasanth Tovinkere, Sergey Vinogradov, Kyle Wheeler,
and Florian Zitzelsberger.
We sincerely thank all those who helped, and we apologize for any who helped us
and we failed to mention!
Mike (along with Rafa and James!) thanks all of the people who have been involved
in TBB over the years: the many developers at Intel who have left their mark on the
library, Alexey Kukanov for sharing insights as we developed this book, the open-source
contributors, the technical writers and marketing professionals that have worked on
documentation and getting the word out about TBB, the technical consulting engineers
and application engineers that have helped people best apply TBB to their problems, the
managers who have kept us all on track, and especially the users of TBB that have always
provided the feedback on the library and its features that we needed to figure out where

xvii
Acknowledgments

to go next. And most of all, Mike thanks his wife Natalie and their kids, Nick, Ali, and
Luke, for their support and patience during the nights and weekends spent on this book.
Rafa thanks his PhD students and colleagues for providing feedback regarding
making TBB concepts more gentle and approachable: José Carlos Romero, Francisco
Corbera, Alejandro Villegas, Denisa Andreea Constantinescu, Angeles Navarro;
particularly to José Daniel García for his engrossing and informative conversations about
C++11, 14, 17, and 20, to Aleksei Fedotov and Pablo Reble for helping with the OpenCL_
node examples, and especially his wife Angeles Navarro for her support and for taking
over some of his duties when he was mainly focused on the book.
James thanks his wife Susan Meredith – her patient and continuous support was
essential to making this book a possibility. Additionally, her detailed editing, which often
added so much red ink on a page that the original text was hard to find, made her one of
our valued reviewers.
As coauthors, we cannot adequately thank each other enough. Mike and James have
known each other for years at Intel and feel fortunate to have come together on this book
project. It is difficult to adequately say how much Mike and James appreciate Rafa! How
lucky his students are to have such an energetic and knowledgeable professor! Without
Rafa, this book would have been much less lively and fun to read. Rafa’s command of
TBB made this book much better, and his command of the English language helped
correct the native English speakers (Mike and James) more than a few times. The three
of us enjoyed working on this book together, and we definitely spurred each other on to
great heights. It has been an excellent collaboration.
We thank Todd Green who initially brought us to Apress. We thank Natalie Pao, of
Apress, and John Somoza, of Intel, who cemented the terms between Intel and Apress
on this project. We appreciate the hard work by the entire Apress team through contract,
editing, and production.
Thank you all,
Mike Voss, Rafael Asenjo, and James Reinders

xviii
Preface
Think Parallel
We have aimed to make this book useful for those who are new to parallel programming
as well as those who are expert in parallel programming. We have also made this book
approachable for those who are comfortable only with C programming, as well as those
who are fluent in C++.
In order to address this diverse audience without “dumbing down” the book, we
have written this Preface to level the playing field.

What Is TBB
TBB is a solution for writing parallel programs in C++ which has become the most
popular, and extensive, support for parallel programming in C++. It is widely used
and very popular for a good reason. More than 10 years old, TBB has stood the test
of time and has been influential in the inclusion of parallel programming support in
the C++ standard. While C++11 made major additions for parallel programming, and
C++17 and C++2x take that ever further, most of what TBB offers is much more than
what belongs in a language standard. TBB was introduced in 2006, so it contains
support for pre-C++11 compilers. We have simplified matters by taking a modern
look at TBB and assuming C++11. Common advice today is “if you don’t have a
C++11 compiler, get one.” Compared with the 2007 book on TBB, we think C++11,
with lambda support in particular, makes TBB both richer and easier to understand
and use.
TBB is simply the best way to write a parallel program in C++, and we hope to help
you be very productive in using TBB.

xix
Preface

Organization of the Book and Preface


This book is organized into four major sections:

I. Preface: Background and fundamentals useful for understanding


the remainder of the book. Includes motivations for the TBB
parallel programming model, an introduction to parallel
programming, an introduction to locality and caches, an
introduction to vectorization (SIMD), and an introduction to
the features of C++ (beyond those in the C language) which are
supported or used by TBB.

II. Chapters 1–8: A book on TBB in its own right. Includes an


introduction to TBB sufficient to do a great deal of effective
parallel programming.

III. Chapters 9–20: Include special topics that give a deeper


understanding of TBB and parallel programming and deal with
nuances in both.

IV. Appendices A and B and Glossary: A collection of useful


information about TBB that you may find interesting,
including history (Appendix A) and a complete reference guide
(Appendix B).

T hink Parallel
For those new to parallel programming, we offer this Preface to provide a foundation
that will make the remainder of the book more useful, approachable, and self-contained.
We have attempted to assume only a basic understanding of C programming and
introduce the key elements of C++ that TBB relies upon and supports. We introduce
parallel programming from a practical standpoint that emphasizes what makes parallel
programs most effective. For experienced parallel programmers, we hope this Preface
will be a quick read that provides a useful refresher on the key vocabulary and thinking
that allow us to make the most of parallel computer hardware.

xx
Preface

After reading this Preface, you should be able to explain what it means to “Think
Parallel” in terms of decomposition, scaling, correctness, abstraction, and patterns.
You will appreciate that locality is a key concern for all parallel programming. You
will understand the philosophy of supporting task programming instead of thread
programming – a revolutionary development in parallel programming supported by TBB.
You will also understand the elements of C++ programming that are needed above and
beyond a knowledge of C in order to use TBB well.
The remainder of this Preface contains five parts:
(1) An explanation of the motivations behind TBB (begins on page xxi)

(2) An introduction to parallel programming (begins on page xxvi)

(3) An introduction to locality and caches – we call “Locality and


the Revenge of the Caches” – the one aspect of hardware that we
feel essential to comprehend for top performance with parallel
programming (begins on page lii)

(4) An introduction to vectorization (SIMD) (begins on page lx)

(5) An introduction to the features of C++ (beyond those in the C


language) which are supported or used by TBB (begins on page lxii)

Motivations Behind Threading Building Blocks (TBB)


TBB first appeared in 2006. It was the product of experts in parallel programming at
Intel, many of whom had decades of experience in parallel programming models,
including OpenMP. Many members of the TBB team had previously spent years helping
drive OpenMP to the great success it enjoys by developing and supporting OpenMP
implementations. Appendix A is dedicated to a deeper dive on the history of TBB and
the core concepts that go into it, including the breakthrough concept of task-stealing
schedulers.
Born in the early days of multicore processors, TBB quickly emerged as the most
popular parallel programming model for C++ programmers. TBB has evolved over its
first decade to incorporate a rich set of additions that have made it an obvious choice for
parallel programming for novices and experts alike. As an open source project, TBB has
enjoyed feedback and contributions from around the world.

xxi
Preface

TBB promotes a revolutionary idea: parallel programming should enable the


programmer to expose opportunities for parallelism without hesitation, and the
underlying programming model implementation (TBB) should map that to the hardware
at runtime.
Understanding the importance and value of TBB rests on understanding three
things: (1) program using tasks, not threads; (2) parallel programming models do
not need to be messy; and (3) how to obtain scaling, performance, and performance
portability with portable low overhead parallel programming models such as TBB. We
will dive into each of these three next because they are so important! It is safe to say
that the importance of these were underestimated for a long time before emerging as
cornerstones in our understanding of how to achieve effective, and structured, parallel
programming.

Program Using Tasks Not Threads


Parallel programming should always be done in terms of tasks, not threads. We cite an
authoritative and in-depth examination of this by Edward Lee at the end of this Preface.
In 2006, he observed that “For concurrent programming to become mainstream, we
must discard threads as a programming model.”
Parallel programming expressed with threads is an exercise in mapping an
application to the specific number of parallel execution threads on the machine we
happen to run upon. Parallel programming expressed with tasks is an exercise in
exposing opportunities for parallelism and allowing a runtime (e.g., TBB runtime)
to map tasks onto the hardware at runtime without complicating the logic of our
application.
Threads represent an execution stream that executes on a hardware thread for a
time slice and may be assigned other hardware threads for a future time slice. Parallel
programming in terms of threads fail because they are too often used as a one-to-one
correspondence between threads (as in execution threads) and threads (as in hardware
threads, e.g., processor cores). A hardware thread is a physical capability, and the
number of hardware threads available varies from machine to machine, as do some
subtle characteristics of various thread implementations.
In contrast, tasks represent opportunities for parallelism. The ability to subdivide
tasks can be exploited, as needed, to fill available threads when needed.

xxii
Preface

With these definitions in mind, a program written in terms of threads would have
to map each algorithm onto specific systems of hardware and software. This is not only
a distraction, it causes a whole host of issues that make parallel programming more
difficult, less effective, and far less portable.
Whereas, a program written in terms of tasks allows a runtime mechanism, for
example, the TBB runtime, to map tasks onto the hardware which is actually present at
runtime. This removes the distraction of worrying about the number of actual hardware
threads available on a system. More importantly, in practice this is the only method
which opens up nested parallelism effectively. This is such an important capability, that
we will revisit and emphasize the importance of nested parallelism in several chapters.

 omposability: Parallel Programming Does Not Have


C
to Be Messy
TBB offers composability for parallel programming, and that changes everything.
Composability means we can mix and match features of TBB without restriction. Most
notably, this includes nesting. Therefore, it makes perfect sense to have a parallel_for
inside a parallel_for loop. It is also okay for a parallel_for to call a subroutine, which
then has a parallel_for within it.
Supporting composable nested parallelism turns out to be highly desirable
because it exposes more opportunities for parallelism, and that results in more scalable
applications. OpenMP, for instance, is not composable with respect to nesting because
each level of nesting can easily cause significant overhead and consumption of resources
leading to exhaustion and program termination. This is a huge problem when you
consider that a library routine may contain parallel code, so we may experience issues
using a non-composable technique if we call the library while already doing parallelism.
No such problem exists with TBB, because it is composable. TBB solves this, in part, by
letting use expose opportunities for parallelism (tasks) while TBB decides at runtime
how to map them to hardware (threads).
This is the key benefit to coding in terms of tasks (available but nonmandatory
parallelism (see “relaxed sequential semantics” in Chapter 2)) instead of threads
(mandatory parallelism). If a parallel_for was considered mandatory, nesting would
cause an explosion of threads which causes a whole host of resource issues which can
easily (and often do) crash programs when not controlled. When parallel_for exposes

xxiii
Preface

available nonmandatory parallelism, the runtime is free to use that information to match
the capabilities of the machine in the most effective manner.
We have come to expect composability in our programming languages, but most
parallel programming models have failed to preserve it (fortunately, TBB does preserve
composability!). Consider “if” and “while” statements. The C and C++ languages allow
them to freely mix and nest as we desire. Imagine this was not so, and we lived in a world
where a function called from within an if statement was forbidden to contain a while
statement! Hopefully, any suggestion of such a restriction seems almost silly. TBB brings
this type of composability to parallel programming by allowing parallel constructs to be
freely mixed and nested without restrictions, and without causing issues.

 caling, Performance, and Quest for Performance


S
Portability
Perhaps the most important benefit of programming with TBB is that it helps
create a performance portable application. We define performance portability as
the characteristic that allows a program to maintain a similar “percentage of peak
performance” across a variety of machines (different hardware, different operating
systems, or both). We would like to achieve a high percentage of peak performance on
many different machines without the need to change our code.
We would also like to see a 16× gain in performance on a 64-core machine vs. a
quad-core machine. For a variety of reasons, we will almost never see ideal speedup
(never say never: sometimes, due to an increase in aggregate cache size we can see more
than ideal speedup – a condition we call superlinear speedup).

WHAT IS SPEEDUP?

Speedup is formerly defined to be the time to run sequentially (not in parallel) divided by the
time to run in parallel. If my program runs in 3 seconds normally, but in only 1 second on a
quad-core processor, we would say it has a speedup of 3×. Sometimes, we might speak of
efficiency which is speedup divided by the number of processing cores. Our 3× would be 75%
efficient at using the parallelism.

The ideal goal of a 16× gain in performance when moving from a quad-core machine
to one with 64 cores is called linear scaling or perfect scaling.

xxiv
Preface

To accomplish this, we need to keep all the cores busy as we grow their
numbers – something that requires considerable available parallelism. We will dive
more into this concept of “available parallelism” starting on page xxxvii when we discuss
Amdahl’s Law and its implications.
For now, it is important to know that TBB supports high-performance programming
and helps significantly with performance portability. The high-performance support
comes because TBB introduces essentially no overhead which allows scaling to proceed
without issue. Performance portability lets our application harness available parallelism
as new machines offer more.
In our confident claims here, we are assuming a world where the slight additional
overhead of dynamic task scheduling is the most effective at exposing the parallelism
and exploiting it. This assumption has one fault: if we can program an application to
perfectly match the hardware, without any dynamic adjustments, we may find a few
percentage points gain in performance. Traditional High-Performance Computing
(HPC) programming, the name given to programming the world’s largest computers
for intense computations, has long had this characteristic in highly parallel scientific
computations. HPC developer who utilize OpenMP with static scheduling, and find it
does well with their performance, may find the dynamic nature of TBB to be a slight
reduction in performance. Any advantage previously seen from such static scheduling is
becoming rarer for a variety of reasons. All programming including HPC programming,
is increasing in complexity in a way that demands support for nested and dynamic
parallelism support. We see this in all aspects of HPC programming as well, including
growth to multiphysics models, introduction of AI (artificial intelligence), and use of ML
(machine learning) methods. One key driver of additional complexity is the increasing
diversity of hardware, leading to heterogeneous compute capabilities within a single
machine. TBB gives us powerful options for dealing with these complexities, including
its flow graph features which we will dive into in Chapter 3.

It is clear that effective parallel programming requires a separation between


exposing parallelism in the form of tasks (programmer’s responsibility) and
mapping tasks to hardware threads (programming model implementation’s
responsibility).

xxv
Preface

Introduction to Parallel Programming


Before we dive into demystifying the terminology and key concepts behind parallel
programming, we will make a bold claim: parallel is more intuitive than sequential.
Parallelism is around us every day in our lives, and being able to do a single thing step by
step is a luxury we seldom enjoy or expect. Parallelism is not unknown to us and should
not be unknown in our programming.

Parallelism Is All Around Us


In everyday life, we find ourselves thinking about parallelism. Here are a few examples:

• Long lines: When you have to wait in a long line, you have
undoubtedly wished there were multiple shorter (faster) lines, or
multiple people at the front of the line helping serve customers more
quickly. Grocery store check-out lines, lines to get train tickets, and
lines to buy coffee are all examples.

• Lots of repetitive work: When you have a big task to do, which many
people could help with at the same time, you have undoubtedly wished for
more people to help you. Moving all your possessions from an old dwelling
to a new one, stuffing letters in envelopes for a mass mailing, and installing
the same software on each new computer in your lab are examples. The
proverb “Many hands make light work” holds true for computers too.

Once you dig in and start using parallelism, you will Think Parallel. You will learn to
think first about the parallelism in your project, and only then think about coding it.

Yale Pat, famous computer architect, observed:


A Conventional Wisdom Problem is the belief that
Thinking in Parallel is Hard
Perhaps (All) Thinking is Hard!
How do we get people to believe that:
Thinking in parallel is natural
(we could not agree more!)

xxvi
Preface

Concurrent vs. Parallel


It is worth noting that the terms concurrent and parallel are related, but subtly different.
Concurrent simply means “happening during the same time span” whereas parallel is
more specific and is taken to mean “happening at the same time (at least some of the
time).” Concurrency is more like what a single person tries to do when multitasking,
whereas parallel is akin to what multiple people can do together. Figure P-1 illustrates
the concepts of concurrency vs. parallelism. When we create effective parallel programs,
we are aiming to accomplish more than just concurrency. In general, speaking of
concurrency will mean there is not an expectation for a great deal of activity to be truly
parallel – which means that two workers are not necessarily getting more work done than
one could in theory (see tasks A and B in Figure P-1). Since the work is not done sooner,
concurrency does not improve the latency of a task (the delay to start a task). Using the
term parallel conveys an expectation that we improve latency and throughput (work
done in a given time). We explore this in more depth starting on page xxxv when we
explore limits of parallelism and discuss the very important concepts of Amdahl’s Law.

Figure P-1. Parallel vs. Concurrent: Tasks (A) and (B) are concurrent relative to
each other but not parallel relative to each other; all other combinations are both
concurrent and parallel

xxvii
Preface

Enemies of parallelism: locks, shared mutable state, synchronization, not “Thinking


Parallel,” and forgetting that algorithms win.

Enemies of Parallelism
Bearing in mind the enemies of parallel programming will help understand our advocacy
for particular programming methods. Key parallel programming enemies include

• Locks: In parallel programming, locks or mutual exclusion objects


(mutexes) are used to provide a thread with exclusive access to a
resource – blocking other threads from simultaneously accessing
the same resource. Locks are the most common explicit way to
ensure parallel tasks update shared data in a coordinated fashion
(as opposed to allowing pure chaos). We hate locks because they
serialize part of our programs, limiting scaling. The sentiment “we
hate locks” is on our minds throughout the book. We hope to instill
this mantra in you as well, without losing sight of when we must
synchronize properly. Hence, a word of caution: we actually do love
locks when they are needed, because without them disaster will
strike. This love/hate relationship with locks needs to be understood.

• Shared mutable state: Mutable is another word for “can be changed.”


Shared mutable state happens any time we share data among
multiple threads, and we allow it to change while being shared. Such
sharing either reduces scaling when synchronization is needed and
used correctly, or it leads to correctness issues (race conditions or
deadlocks) when synchronization (e.g., a lock) is incorrectly applied.
Realistically, we need shared mutable state when we write interesting
applications. Thinking about careful handling of shared mutable
state may be an easier way to understand the basis of our love/hate
relationship with locks. In the end, we all end up “managing” shared
mutable state and the mutual exclusion (including locks) to make it
work as we wish.

• Not “Thinking Parallel”: Use of clever bandages and patches will not
make up for a poorly thought out strategy for scalable algorithms.
Knowing where the parallelism is available, and how it can be
xxviii
Preface

exploited, should be considered before implementation. Trying to add


parallelism to an application, after it is written, is fraught with peril.
Some preexisting code may shift to use parallelism relatively well, but
most code will benefit from considerable rethinking of algorithms.

• Forgetting that algorithms win: This may just be another way to say
“Think Parallel.” The choice of algorithms has a profound effect on
the scalability of applications. Our choice of algorithms determine
how tasks can divide, data structures are accessed, and results are
coalesced. The optimal algorithm is really the one which serves as the
basis for optimal solution. An optimal solution is a combination of the
appropriate algorithm, with the best matching parallel data structure,
and the best way to schedule the computation over the data. The
search for, and discovery of, algorithms which are better is seemingly
unending for all of us as programmers. Now, as parallel programmers,
we must add scalable to the definition of better for an algorithm.

Locks, can’t live with them, can’t live without them.

Terminology of Parallelism
The vocabulary of parallel programming is something we need to learn in order to
converse with other parallel programmers. None of the concepts are particularly hard,
but they are very important to internalize. A parallel programmer, like any programmer,
spends years gaining a deep intuitive feel for their craft, despite the fundamentals being
simple enough to explain.
We will discuss decomposition of work into parallel tasks, scaling terminology,
correctness considerations, and the importance of locality due primarily to cache effects.
When we think about our application, how do we find the parallelism?
At the highest level, parallelism exists either in the form of data to operate on
in parallel, or in the form of tasks to execute in parallel. And they are not mutually
exclusive. In a sense, all of the important parallelism is in data parallelism. Nevertheless,
we will introduce both because it can be convenient to think of both. When we discuss
scaling, and Amdahl’s Law, our intense bias to look for data parallelism will become
more understandable.

xxix
Discovering Diverse Content Through
Random Scribd Documents
“That’s so,” admitted Paul in a tone of deep disappointment.
“How much did you say the debt amounted to?” asked
Amesbury.
“Eighteen dollars for each of us,” answered Paul, “but we’ve
been here working two months with wages, and that takes off six
dollars from each debt, so the first of the month our debts’ll each be
down to twelve dollars.”
“Good arithmetic; worked it out right the first time,” Amesbury
nodded in approval. “Now if you each pay the old pirate twelve
dollars, how much will you owe him and how long can he hold you
at the post?”
“Why the debt would be squared and he couldn’t keep us at all.”
“Right again.”
“But we has no money to pay un,” broke in Dan.
“Just leave all that to me,” counseled Amesbury. “I’ll attend to
his case.”
“Oh, thank you, Mr. Amesbury,” and Paul grasped the trapper’s
hand.
“’Tis wonderful kind of you,” said Dan.
“Don’t waste your words thanking me,” cautioned Amesbury.
“Wait till I get you out in the bush. I’ll get my money’s worth out of
you chaps.”

“‘See-saw, Margery Daw,


Johnnie shall have a new master;
He shall have but a penny a day,
Because he can’t work any faster.’”

He stretched his long arms, yawned, untangled his ungainly legs


from the knot into which he had twisted them, and rose to his feet,
remarking:
“Do you see where the sun is, fellows? It’s time to be going. You
can lash these traps of yours on the top of my flat sled. Ahmik and I
left our flat sleds just below here.”
“My criky!” exclaimed Paul. “The sun’s setting. I didn’t realize it
was so late.”
In accordance with Amesbury’s suggestion all of their things,
save their guns, were lashed on one of the long, narrow toboggans
upon which he and Ahmik hauled their provisions and camp outfit,
and the four turned toward the post, in single file, Paul and Dan
highly elated with the prospect of presently turning homeward.
CHAPTER XVI
RELEASED FROM BONDAGE

T AMMAS, Samuel, and Amos, who had spent the day caribou
hunting, but had killed nothing, were gathered around the stove
engaged in a heated argument as to whether a caribou would or
would not charge a man when at close quarters, when Paul and Dan
entered with the visitors.
“Weel! Weel!” exclaimed Tammas, rising. “If ’tis no Charley
Amesbury and John Buck wi’ the laddies!”
Amesbury and Ahmik were old visitors at the post. Every one
knew them and gave them a most hearty welcome. Even Chuck,
who was mixing biscuit for supper, wiped his dough-debaubed right
hand upon his trousers, that he might offer it to the visitors, and
Jerry, who lived with his family in a little nearby cabin, and had seen
them pass, came over to greet them.
Amesbury warned the lads to say nothing of their plan to the
post folk. “I’ll break the news gently to Davy MacTavish when the
time is ripe for it,” said he. “You fellows keep right at your work as
though you were to stay here forever.” And therefore no mention
was made of the arrangement to Tammas and the others.
During the days that followed Amesbury and Ahmik made some
purchases at the post shop, including the provisions necessary for
the return journey to their trapping grounds. They had no debt here,
and therefore bartered pelts to pay for their purchases. Their trading
completed, Amesbury produced two particularly fine marten skins,
and laid them upon the counter. “I’ve got everything I need,” said
he, “but I don’t want to carry these back with me. How much’ll you
give?”
“Trade or cash?” asked MacTavish, examining them critically.
“Trade. Give me credit for ’em. I may want something more
before I go.”
“Ten dollars each.”
“Not this time. They’re prime, and they’re worth forty dollars
apiece in Winnipeg.”
“This isn’t Winnipeg.”
“Give them back. They’re light to pack, and I guess I’ll take
them to Winnipeg.”
But MacTavish was gloating over them. They were glossy black,
remarkably well furred, the flesh side clean and white.
“They are pretty fair martens,” he said finally, as though
weighing the matter. “I may do a little better; say fifteen dollars.”
“I’ll take them to Winnipeg.”
“You can’t get Winnipeg prices here.”
“No, but I don’t have to sell them here. I thought if you’d give
me half what they’re worth I’d let you have them. You can keep
them for twenty dollars each. Not a cent less.”
“Can’t do it, but I’ll say as a special favor to you eighteen
dollars.”
“Hand them back. I’m not an Indian.”
“You know I’d not give an Indian over five dollars.”
“I know that, but I don’t ask for a debt. You see I’m pretty free
to do as I please. Hand ’em back.”
But the pelts were too good for MacTavish to let pass him, and
after a show of hesitancy he placed them upon the shelf behind him
and said reluctantly:
“They’re not worth it, but I’ll allow you twenty dollars each for
them. But it’s a very special favor.”
“Needn’t if you don’t want them. I wouldn’t bankrupt the
company for the world.”
“I’ll take them.”
The bargain concluded, Amesbury strolled away, humming:

“‘A diller, a dollar,


A ten o’clock scholar,
What makes you come so soon?
You used to come at ten o’clock,
But now you come at noon,’”
and MacTavish glared after him.
It was a busy week at the post. Day after day picturesque
Indians came in, hauling long, narrow toboggans, pitching their
tepees near by, and crowding the shop during daylight hours
bartering away their early catch of pelts for necessary and
unnecessary things.
Paul and Dan kept steadily at their tasks. Amesbury made no
further reference to the arrangement he had made with them until
New Year’s eve, when he strolled over to the woodpile toward
sundown, where they were hard at work, humming, as he watched
them make the last cut in a stick of wood:

“‘If I’d as much money as I could spend,


I never would cry ‘old chairs to mend,
Old chairs to mend, old chairs to mend;’
I never would cry ‘old chairs to mend.’”

When they laid down the saw to place another stick on the
buck, he said:
“Never mind that. You chaps come along with me, and we’ll pay
our respects to Mr. MacTavish.”
“Oh, have you told him we were going? I was almost afraid
you’d forgotten it!” exclaimed Paul exultantly.
“Never a word. Reserved the entertainment for an audience,
and you fellows are to be the audience. Come along; he’s in his
office now,” and Amesbury strode toward the office, Paul and Dan
expectantly following.
MacTavish glanced up from his desk as they entered, and
nodding to Amesbury, who had advanced to the center of the room,
noticed Paul and Dan near the door.
“What are you fellows knocking off work at this time of day for?
Get back to work, and if you want anything, come around after
hours.”
“They’ve knocked off for good,” Amesbury answered for them,
his eyes reflecting amusement. “They’re going trapping with me up
Indian Lake way. I’m sorry to deprive you of them, but I guess I’ll
have to.”
“What!” roared MacTavish, jumping to his feet. “Are you
inducing those boys to desert? What does this nonsense mean?”
“Yes, they’re going. Sorry you feel so badly at losing their
society, but I don’t see any way out of it.”
“Well, they’re not going.” MacTavish spoke more quietly, but
with determination, glowering at Amesbury. “They have a debt here
and they will stay until it is worked out. They’ve signed articles to
remain here until the debt is worked out, and I will hold them under
the articles. You fellows go back to your work.”
“We’re not going to work for you any more,” said Paul, his anger
rising. “Mr. Amesbury has told you we’re going with him, and we
are.”
“Go back to your work, I say, or I’ll have you flogged!”
MacTavish was now in a rage, and he made for the lads as though to
strike them, only to find the ungainly figure of Amesbury in the way.
“Tut! Tut! Big Jack Blunderbuss trying to strike the little
Tiddledewinks! Fine display of courage! But not this time. No
pugilistic encounters with any one but me while I’m around, and my
hands have an awful itch to get busy.”
“None of your interference in the affairs of this post!” bellowed
MacTavish. “You’re breeding mutiny here, and I’ve a mind to run you
off the reservation.”
“Hey diddle diddle,” broke in Amesbury, who had not for a
moment lost his temper, and who fairly oozed good humor. “This
isn’t seemly in a man in your position, MacTavish. Now let’s be
reasonable. Sit down and talk the matter over.”
“There’s nothing to talk over with you!” shouted MacTavish, who
nevertheless resumed his seat.
“Well, now, we’ll see.” Amesbury drew a chair up, sat down in
front of MacTavish, and leaning forward assumed a confidential
attitude. “In the first place,” he began, “the lads owe a debt, you
say, and you demand that it be paid.”
“They can’t leave here until it is paid! They can’t leave anyhow!”
still in a loud voice.
“No, no; of course not. That’s what we’ve got to talk about. I’ll
pay the debt. Now, how much is it?”
“That won’t settle it. They both signed on here for at least six
months, at three dollars a month, and they’ve got to stay the six
months.”
“Now you know, MacTavish, they are both minors and under the
law they are not qualified to make such a contract with you. Even
were they of age, there isn’t a court within the British Empire but
would adjudge such a contract unconscionable, and throw it out
upon the ground that it was signed under duress. You couldn’t hire
Indians to do the work these lads have done under twelve dollars a
month. In all justice you owe them a balance, for they’ve more than
worked out their debt.”
“I’m the court here, and I’m the judge, and I’m going to keep
these fellows right here.”
“Wrong in this case. There’s no law or court here except the law
and the court of the strong arm. Now I’ve unanimously elected
myself judge, jury and sheriff to deal with this matter. In these
various capacities I’ve decided their debt is paid and they’re going
with me. As their friend and your friend, however, I’ve suggested for
the sake of good feeling that they pay the balance you claim is due
you under the void agreement, and I offer to make settlement in full
now. I believe you claim twelve dollars due from each—twenty-four
dollars in all?”
It was plain that Amesbury had determined to carry out the plan
detailed, with or without the factor’s consent, and finally MacTavish
agreed to release Paul and Dan, and charge the twenty-four dollars
which he claimed still due on their debt against the forty dollars
credited to Amesbury for the two marten skins. He declared,
however, that had he known Amesbury’s intention he would not have
accepted a pelt from him, nor would he have sold Amesbury the
provisions necessary to support him and the lads on their journey to
Indian Lake.
“You can never trade another shilling’s worth at this post,”
announced MacTavish as the three turned to the door, “not another
shilling’s worth.”
“Now, now, MacTavish,” said Amesbury, smiling, “you know
better. I’ve a credit here that I’ll come back to trade out, and I’ll
have some nice pelts that you’ll be glad enough to take from me.”
“Not a shilling’s worth,” repeated the factor, whose anger was
not appeased when he heard Amesbury humming, as he passed out
of the door:

“‘A diller, a dollar, a ten o’clock scholar,


What made you come so soon?
You used to come at ten o’clock,
But now you come at noon.’”

It was to be expected that MacTavish would refuse them shelter


for the night, but he made no reference to it, probably because in
his anger he forgot to do so, and the following morning, when his
wrath had cooled, he astonished Paul and Dan when he met them
with, for him, a very cheery greeting.
On New Year’s morning Amesbury and Ahmik visited the Indian
encampment, and with little difficulty secured from their Indian
friends two light toboggans for Paul and Dan to use in the
transportation of their equipment.
The day was spent in taking part in snowshoe obstacle races,
rifle matches, and many contests with the Indian visitors, and the
evening in final preparations for departure. In early morning, before
the bell called the post folk to their daily task, they passed out of the
men’s house for the last time. Tammas, Amos and Samuel were
sorry to lose their young friends and assistants, but glad of their
good fortune.
“I’ll be missin’ ye, laddies. God bless ye,” said Tammas.
“Aye, God bless ye,” repeated Samuel.
“Hi ’opes you’ll ’ave a pleasant trip. Tyke care of yourselves,”
was Amos’s hearty farewell.
They turned their faces toward the vast dark wilderness to the
westward, redolent with mystery and fresh adventure. Presently the
flickering lights of the post, which a few weeks before they had
hailed so joyously, were lost to view.
CHAPTER XVII
THE SNOWSHOE JOURNEY TO INDIAN LAKE

T HERE was yet no hint of dawn. Moon and stars shone cold and
white out of a cold, steel-blue sky. The moisture of the frozen
atmosphere, shimmering particles of frost, hung suspended in space.
The snow crunched and creaked under their swiftly moving
snowshoes.
They traveled in single file, after the fashion of the woods.
Amesbury led, then followed Ahmik, after him Paul, with Dan
bringing up the rear. Each hauled a toboggan, and though Paul’s and
Dan’s were much less heavily laden than Amesbury’s and Ahmik’s,
the lads had difficulty in keeping pace with the long, swinging half-
trot of the trapper and Indian.
Presently they entered the spruce forest of a river valley, dead
and cold, haunted by weird shadows, flitting ghostlike hither and
thither across ghastly white patches of moonlit snow. Now and again
a sharp report, like a pistol shot, startled them. It was the action of
frost upon the trees, a sure indication of extremely low temperature.
Dawn at length began to break—slowly—slowly—dispersing the
grotesque and ghostlike shadows. As dawn melted into day, the real
took the place of the unreal, and the frigid white wilderness that had
engulfed them presented its true face to the adventurous travelers.
Scarce a word was spoken as they trudged on. Amesbury and
Ahmik kept the silence born of long life in the wilderness where men
exist by pitting human skill against animal instinct, and learn from
the wild creatures they stalk the lesson of necessary silence and
acute listening. Dan, too, in his hunting experiences with his father,
had learned to some degree the same lesson, and Paul had small
inclination to talk, for he needed all his breath to hold the rapid
pace.
Rime had settled upon their clothing, and dawn revealed them
white as the snow over which they passed. The moisture from their
eyes froze upon their eyelashes, and now and again it was found
necessary to pick it off, painfully, as they walked.
The sun was two hours high when Amesbury and Ahmik
suddenly halted, and when Paul and Dan, who had fallen
considerably in the rear, overtook them, Ahmik was cutting wood,
while Amesbury, lighting a fire, was singing:
“‘Polly put the kettle on,
Polly put the kettle on,
Polly put the kettle on,
And let’s drink tea.’”

“How are you standing it, fellows?” he asked, looking up.


“Not bad, sir,” answered Dan.
“I’m about tuckered out, and as empty as a drum!” exclaimed
Paul.
“Pretty hard pull for raw recruits,” said Amesbury, laughing. “But
wait till tomorrow! Cheer up! The worst is yet to come.”
“I hope it won’t be any harder than this,” and Paul sat wearily
down upon his toboggan.
“No,” encouraged Amesbury, “better snowshoeing, if anything.
But there’s the wear and tear. You’ll have a hint of it tonight, and
know all about it tomorrow.”
“I finds th’ snowshoein’ not so bad today,” said Dan, “but I’m
thinkin’ now I knows what you means. I had un bad last year when I
goes out wi’ Dad. ’T were wonderful bad, too. I were findin’ it
wonderful hard t’ walk with th’ stiffness all over me when I first
starts in th’ mornin’, but th’ stiffness wears off after a bit, an’ I’m not
mindin’ un after.”
“That’s it. You’re on,” laughed Amesbury, as he chipped some
ice from a frozen brook to fill the kettle for tea.
“Very hard, you find him,” broke in Ahmik, joining in Amesbury’s
laugh. “You get use to him quick. Walk easy like Mr. Amesbury and
me soon. No hard when use to him.”
Ahmik was growing more talkative upon acquaintance, and
drawing out of the natural reticence of his race with strangers, as is
the way of Indians when they learn to know and like one.
It was a hard afternoon for Paul, and he had to summon all his
grit and fortitude to keep going without complaint until the night halt
was finally made, but he did his share of the camp work,
nevertheless, with a will, and when the tent was pitched and wood
cut he sat down more weary than he had ever been in his life.
Amesbury and Ahmik traveled in true Indian fashion when
Indians make flying trips without their families. They had neither
tent nor tent stove to protect them. The experienced woodsman can
protect himself, even in sub-Arctic regions, from the severest storm
and cold, so long as he has an axe. Sometimes he resorts to
temporary shelters, with fires, sometimes to burrows in snowdrifts,
or to such other methods as the particular conditions which he has
to face suggest or demand.
Paul and Dan, however, had their tent, tent stove and other
paraphernalia. The tent they pitched upon the snow, stretching it, by
means of the ridge rope, between two convenient trees. When it
was finally in place Dan banked snow well up upon all sides save the
opening used for an entrance.
While Dan was thus engaged Paul broke spruce boughs for a
floor covering and bed, Ahmik cut wood for the stove, and Amesbury
unpacked the outfit and set the stove in place upon two green log
butts three feet long and six inches thick. This he did that the stove
might not sink into the snow when a fire was lighted and the snow
under the stove began to melt.
The telescope pipe in place, Amesbury put a handful of birch
bark in the stove, broke some small, dry twigs upon it, lighted the
bark, as it blazed filled the stove with some of Ahmik’s neatly split
wood, and in five minutes the interior of the tent was comfortably
warm.
Paul spread the tarpaulin upon the boughs which he had
arranged, stowed their camp things neatly around the edge of the
interior, and night camp was ready. Though rather crowded, the tent
offered sufficient accommodation for the four.
A candle was lighted, and Amesbury installed himself as cook. A
kettle of ice was placed upon the stove to melt and boil for tea. A
frying pan filled with thick slices of salt pork was presently sizzling
on the stove. Then he added some salt and baking powder to a pan
of flour, mixed them thoroughly, and poured enough water from the
kettle of melting ice to make a dough.
The pork, which had now cooked sufficiently, was taken from
the pan and placed upon a tin dish, and the dough, stretched into
thin cakes large enough to fill the circumference of the pan, was
fried, one at a time, in the bubbling pork grease that remained. In
the meantime tea had been made.
“All ready. Fall to,” announced Amesbury.
“I feels I’m ready for un,” said Dan.
“I can eat two meals,” declared Paul.
“I’m interested to see what the day’s work did for you chaps.
Now if you can’t eat, Ahmik and I will feel that we didn’t walk you
fast enough today, and we’ll have to do better tomorrow, eh,
Ahmik?” Amesbury’s eyes twinkled with amusement.
“Ugh! Big walk tomorrow. Very far. Very fast,” and Ahmik
grinned.
“Goodness!” exclaimed Paul. “If we have to walk any farther or
faster tomorrow than we did today, I’ll just collapse. I’m so stiff now
I can hardly move.”
“That’s always the case for a day or two when a fellow starts
out for the first time on snowshoes and does a full day’s work. It
won’t last long, but we’ll take it a little slower tomorrow, to let you
get hardened to it,” Amesbury consoled.
When they stopped to boil the kettle the following day Paul was
scarcely able to lift his feet from the snow. Sharp pains in the calves
of his legs and in his hips and groins were excruciating, and he sat
down upon his toboggan very thankful for the opportunity to rest.
“How is it? Pretty tired?” asked Amesbury, good-naturedly.
“A little stiff—and tired,” answered Paul, whose pride would not
permit him to admit how hard it was for him to keep up.
“We’ll take a little easier gait this afternoon. I didn’t realize we
were hitting it off so hard as we were this morning.”
“Thank you.” Paul wished to say “Don’t go slow on my account,”
but he realized how utterly impossible it would be for him to keep
the more rapid pace.
When luncheon was disposed of and they again fell into line,
the pain was so intense that he could scarcely restrain from crying
out. But he kept going, and saying to himself:
“I won’t be a quitter. I won’t be a quitter.” He began to lag
wofully, however, in spite of his determination and grit, and the
slower pace which Amesbury had set. Thus they traveled silently on
for nearly an hour, when all at once Amesbury stopped, held up his
hand as a signal to the others to halt and remain quiet. Dropping his
toboggan rope he stole stealthily forward and was quickly lost to
view.
Presently a rifle shot rang out, and immediately another. A
moment later Amesbury strode back for his toboggan, where the
others were awaiting him, humming as he came:

“‘His body will make a nice little stew,


And his giblets will make me a little pie, too.’”

“Come along, fellows,” he called. “Two caribou the reward of


vigilance. We’ll skin ’em.”
Just within the woods, at the edge of an open, wind-swept
marsh, they left their toboggans, and a hundred yards beyond lay
the carcasses of the two caribou Amesbury had killed.
“There was a band of a dozen,” he explained, as they walked
out to the game. “I thought we could use about two of them very
nicely.”
“Good!” remarked Ahmik, drawing his knife to begin the process
of skinning at once.
“I’ll tell you what,” said Amesbury, “unless you chaps would like
to help here, suppose you pitch the tent. We’ll not go any farther
today.”
“That’s bully!” exclaimed Paul, who had been at the point of
declaring his inability to walk another mile.
“Everything’s bully,” declared Amesbury, “and fresh meat just
now is the bulliest thing could have come our way. All right, fellows;
you get camp going. You’d find skinning pretty hard work in this
weather, but Ahmik and I don’t mind it.”
“My, but I’m glad we don’t have to go any farther today,” said
Paul when he and Dan returned to make camp. “I’m just done for. I
can hardly move my feet.”
“Does un pain much?” asked Dan, sympathetically.
“You bet it does,” and Paul winced.
“Where is un hurtin’ most now?”
“Here, and here,” indicating his hips, groins and calves.
“Lift un feet—higher.”
“Oh! Ouch!”
“Why weren’t you sayin’ so, now? ’Tis sure th’ snowshoe
ailment, an’ not just stiffness. Mr. Amesbury’d not be goin’ on, an’
you havin’ that.”
“I thought it was just stiffness, and would wear off if I kept
going. Besides, I didn’t want to be a baby and complain.”
“’Tis no stiffness. ’Tis th’ snowshoe ailment, an’ ’twould get
worse, an’ no better, with travelin’. ’Tis wonderful troublesome
sometimes. Dad says if you gets un, stop an’ camp where you is, an’
bide there till she gets better. ’Tis th’ only way there is, Dad says, t’
cure un.”
“I never heard of it before.”
“Now I’ll be pitchin’ th’ tent, an’ you sits on th’ flat-sled an’
keeps still.”
“Oh, I’d freeze if I sat down. I’d rather help.”
They had just got the tent up and a roaring fire in the stove
when Amesbury and Ahmik came for toboggans upon which to haul
the meat to camp.
“I’m thinkin’,” said Dan, “we’ll have t’ be bidin’ here a bit. Paul’s
havin’ th’ snowshoe ailment bad.”
“What’s the trouble, Paul?” asked Amesbury.
Paul explained.
“Why, you’re suffering from mal de raquet. Dan’s right; we must
stay here till you’re better—a day or two will fix that. Mustn’t try to
travel with mal de raquet. It’s a mighty uncomfortable companion.”
At the end of two days, however, Paul was in fairly good
condition again, and the journey was resumed without further
interruption, save twice they were compelled by storms to remain a
day in camp.
Two weeks had elapsed since leaving the post when finally, late
one afternoon, Amesbury shouted back to the lads:
“Come along, fellows. We’re here at last.”
Ahmik had stopped and was shoveling snow with one of his
snowshoes from the door of a low log cabin, half covered with drifts.
It was situated in the center of a small clearing among the fir trees
which looked out upon the white frozen expanse of South Indian
Lake.
“This is our castle,” Amesbury announced as Paul and Dan
joined him. “Here we’re to live in luxurious comfort. That’s the
southern extremity of Indian Lake. What do you think of it?”
“’Tis a wonderful fine place t’ live in if th’ trappin’s good,” said
Dan.
“It looks mighty good to me. What a dandy place it must be in
summer!” Paul exclaimed.
Ahmik now had the door cleared and they entered. The cabin
contained a single square room. At one side was a flat-topped sheet-
iron stove, similar in design to the tent stove commonly in use in the
north, but of considerably larger proportions and heavier material.
Near it was a rough table, in the end opposite the door stood a
rough-hewn bedstead, the bed neatly made up with white spread
and pillow cases. A shelf of well-thumbed books—the Bible,
Shakespeare, Thomas à Kempis, Milton’s Paradise Lost, Bunyan’s
Pilgrim’s Progress, Wordsworth’s Poems, Robinson Crusoe, Mother
Goose’s Melodies, Aesop’s Fables, David Copperfield, and some
random novels and volumes of travel and adventure. On one end of
a second table, evidently used as a writing desk, were neatly piled
old magazines and newspapers, on the other end lay some sheet
music and a violin, and in the center were writing materials.
The chairs, like all of the furniture, were doubtless the
handiwork of Amesbury himself. Everything in the room was
spotlessly clean and in order. The setting sun sent a shaft of sunlight
through a window, giving the room an air of brightness, and
enhancing its atmosphere of homely comfort.
When the fire which Amesbury lighted in the stove began to
crackle, he asked:
“Well, fellows, how do you like my den? Think you can be
comfortable here for three or four months?”
“’Tis grand, sir,” said Dan.
“Mr. Amesbury, it’s splendid!” declared Paul.
Both lads had been long enough from home, and had endured
sufficient buffeting of the wilderness to measure by contrast with
their recent experiences the attractions of Amesbury’s cabin, and it
appealed to them as little short of luxurious.
“Not splendid, but good enough for a trapper. Hang up your
things; you’ll find pegs. Make yourselves at home now. Sit down and
rest up. Ahmik will take care of the stuff outside,” and as Amesbury
went about the preparation of supper he sang:

“‘There was an old woman, and what do you think?


She lived upon nothing but victuals and drink:
Victuals and drink were the chief of her diet;
This tiresome old woman could never be quiet.’”

Luscious caribou steaks were soon frying, biscuits were baking,


and presently the delicious odor of coffee filled the room.
“I always keep coffee here,” explained Amesbury. “Rather have
it than tea, but it’s too bulky to carry when I’m hitting the trail.”
“It’s the first smell of coffee I’ve had since we left the ship, and
oh, but it smells bully to me!” said Paul.
Candles were lighted, a snowy white cloth spread on the table.
When at length they sat down to eat, Amesbury, with bowed head,
asked grace.
“’Tis good,” remarked Dan, accepting a liberal piece of caribou
meat, “t’ hear un say grace. Dad always says un.”
“I neglect it when I’m on the trail,” said Amesbury. “My father
was a preacher. He always said grace at home, and it’s second
nature to me to do it when I sit at a table. Part of eating. We
mustn’t forget, you know, that we owe what we have to a higher
Power, and we shouldn’t forget to give thanks.”
“That’s what Dad would be sayin’, now.” Dan had admired
Amesbury before, but this comparison of him with his father was the
highest compliment he could have paid him, and indicated the
highest regard for his friend.
“I’ll tell you, chaps, my theory of the way the Lord gives us our
blessings. He gives us eyes and hands and feet, and best of all He
gives us brains with which to reason things out. Then He provides
the land with all its products, the birds and animals and forests. He
gives us the sea with its products, too. He intends that we use our
brains in devising methods of applying the products of earth and sea
to our needs, and to use our hands and feet and eyes to carry out
what our brain tells us how to do. If I hadn’t used my eyes and
hands and feet the Lord never would have put this venison on the
table.”
“That’s just what Dad says,” agreed Dan. “He says they ain’t no
use prayin’ for things when they’s a way t’ get un yourself.”
“Your dad’s right. If you chaps had just spent your time praying
when you went adrift on that ice pan, you’d be at the bottom of
Hudson Bay now. Yes, your dad’s right. Thank the Lord for the
things that come your way, but get up and hustle first, or they won’t
come your way. Use your brains and your hands. That’s the thing to
do.”
Supper finished, Amesbury and Ahmik cut tobacco from black
plugs, filled their pipes; Amesbury whittled some long shavings from
a stick of dry wood, lighted an end of a shaving by pushing it
through the stove vent, and applied it to his pipe; Ahmik followed his
example, and then turned his attention to washing dishes.
Puffing contentedly at his pipe, Amesbury lifted the violin from
its case, settled himself before the stove and began tuning the
instrument.
“I likes t’ hear fiddlin’ wonderful well,” remarked Dan.
“That’s good, for I’m going to fiddle. Do you like it, too,
Densmore?”
“I’m very fond of music.”
“Then, no one objecting, I’ll begin.”
Amesbury began playing very softly. Dan sat in open-mouthed
wonder, eyes wide, and scarcely breathing. Paul was enthralled. It
was a master hand that held the bow. The player himself seemed
quite unconscious of his listeners and surroundings. The wrinkles
smoothed out of the corners of his eyes, the alert twinkle left the
eyes and a soft, dreamy expression came into them, as though they
beheld some beautiful vision. He seemed transfigured as Paul looked
at him. Another being had taken the place of the ungainly, rough-
clad trapper.
For a full hour he played. Then laying his violin across his knees
sat silent for a little. The music had cast a spell upon them. Even
Ahmik, who had seated himself near the table, had let his pipe die
out.
All at once the humorous wrinkles came again into the corners
of Amesbury’s eyes, and the eyes began to sparkle and laugh. He
arose and returned the violin to its case, humming as he did so:

“‘Hey diddle diddle,


The cat and the fiddle.’

“I always like a little music after supper,” he remarked, resuming


his seat.
“Oh, ’twere more than music!” exclaimed Dan. “’T were—’t were
—I’m thinkin’—’t were like in heaven. ’T weren’t fiddlin’, sir. ’T were
music of angels in th’ fiddle, sir.”
“That’s the best compliment I ever received,” laughed
Amesbury.
“Mr. Amesbury,” asked Paul, “where did you ever learn to play
like that? I heard Madagowski, the great Polish violinist that every
one raved over last year. I thought it was great then, but after
hearing you it seems just common.”
“You chaps will make me vain if you keep this up,” and
Amesbury laughed again.
“But where did you learn?” insisted Paul. “And what ever made
you turn trapper?”
Amesbury’s face grew suddenly grave, almost agonized.
“Oh, Mr. Amesbury!” Paul exclaimed, feeling instinctively that he
had made a mistake in urging the question. “If I shouldn’t ask, don’t
tell me! I’m sorry.”
“It’s all right, Paul,” said Amesbury, quietly. “I’ll tell you the
story. It may be well for you to hear it.”
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

textbookfull.com

You might also like