0% found this document useful (0 votes)

3K views55 pages

Multicore Programming in Haskell

http://qconlondon.com/london-2010/speaker/Simon+Marlow http://www.haskell.org/~simonmar/multicore-haskell-marlow-qcon2010.pdf The key to unlocking accessible concurrent programming lies in controlling interactions: to run a program in parallel we have to understand the interactions between the parallel tasks. Many of the better concurrent programming models rely on reducing interactions. In message passing, for example, interactions only occur through messages. Haskell is a pure functional lanugage, which means that the default is zero interaction, and access to state is tightly controlled by the type system. So it should come as no surprise that Haskell is establishing itself not only as a testbed for concurrent programming models, but also as a realistic platform for multicore programming. In this talk I'll give a tour of concurrency from the Haskell viewpoint through various practical examples. I'll explain why we believe there is no "one size fits all" concurrent programming model. Haskell provides thread-based concurrency with a choice of synchronisation mechanisms for situations where those models fit best, but also pure parallel programming models that are often a better choice when the goal is to parallelise a compute-intensive task.

Uploaded by

Don Stewart

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3K views55 pages

Multicore Programming in Haskell

Uploaded by

Don Stewart

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Multicore programming in

Haskell
Simon Marlow
Microsoft Research
A concurrent web server

server :: Socket -> IO ()

server sock =
forever
(do
acc <- Network.accept sock
forkIO (http acc)
)

create a new thread the client/server

for each new client protocol is implemented
in a single-threaded way
Concurrency = abstraction

• Threads let us implement individual

interactions separately, but have them happen
“at the same time”
• writing this with a single event loop is complex
and error-prone
• Concurrency is for making your program
cleaner.
More uses for threads

• for hiding latency

– e.g. downloading multiple web pages
• for encapsulating state
– talk to your state via a channel
• for making a responsive GUI
• fault tolerance, distribution Parallelism
• ... for making your program faster?
– are threads a good abstraction for multicore?
Why is concurrent programming hard?

• non-determinism
– threads interact in different ways depending on the
scheduler
– programmer has to deal with this somehow: locks,
messages, transactions
– hard to think about
– impossible to test exhaustively
• can we get parallelism without non-
determinism?
What Haskell has to offer

• Purely functional by default

– computing pure functions in parallel is deterministic
• Type system guarantees absence of side-effects
• Great facilities for abstraction
– Higher-order functions, polymorphism, lazy evaluation
• Wide range of concurrency paradigms
• Great tools
The rest of the talk

• Parallel programming in Haskell

• Concurrent data structures in Haskell
Parallel programming in Haskell

par :: a -> b -> b

Evaluate the first return the second

argument in parallel argument
Parallel programming in Haskell

par :: a -> b -> b

pseq :: a -> b -> b

Evaluate the first Return the second

argument argument
Using par and pseq
This does not
import Control.Parallel calculate the value
par
of p.indicates that ap
It allocates
main = could be evaluated
suspension, or
let pseq evaluates
in parallel with q
p = primes !! 3500 result:first,thunk,
then
for
returns
(pseq q (print
(primes !! (p,q))
3500)
q = nqueens 12 • p is sparked
(printby par
(p,q))
in • q is evaluated by pseq
par pp (pseq
par $ pseqq q(print
$ print (p,q)
(p,q)) • p is demanded by print
primes = ... • (p,q) is printed
nqueens = ...
write it like this if you
want (a $ b = a b)
ThreadScope
Zooming in...

The spark is picked up

here
How does par actually work?

Thread 1 Thread 3 Thread 2

CPU 0 CPU 1 CPU 2

Correctness-preserving optimisation

par a b == b
• Replacing “par a b” with “b” does not change
the meaning of the program
– only its speed and memory usage
– par cannot make the program go wrong
– no race conditions or deadlocks, guaranteed!
• par looks like a function, but behaves like an
annotation
How to use par

• par is very cheap: a write into a circular buffer

• The idea is to create a lot of sparks
– surplus parallelism doesn’t hurt
– enables scaling to larger core counts without
changing the program
• par allows very fine-grained parallelism
– but using bigger grains is still better
The N-queens problem

Place n queens on an n x n
board such that no queen
attacks any other, horizontally,
vertically, or diagonally
N queens
[1,3,1] [1,1]

[2,3,1]
[2,1] [1]
[3,3,1]

[4,3,1] [3,1] []

[5,3,1] [4,1]
[2]
[6,3,1]
...

...
...
N-queens in Haskell
nqueens :: Int -> [[Int]] A board is represented as a
nqueens n = subtree n [] list of queen rows
where
children :: [Int] -> [[Int]]
children b = [ (q:b) | q <- [1..n], children calculates the
safe q b ] valid boards that can
be made by adding
subtree :: Int -> [Int] -> [[Int]] another queen
subtree 0 b = [b]
subtree c b =
subtree calculates all
concat $
the valid boards
map (subtree (c-1)) $
starting from the given
children b
board by adding c
more columns
safe :: Int -> [Int] -> Bool
...
Parallel N-queens

• How can we parallelise this?

• Divide and conquer
[1]
– aka map/reduce
– calculate subtrees in parallel, []
join the results
[2]

...
Parallel N-queens

nqueens :: Int -> [[Int]]

nqueens n = subtree n []
where
children :: [Int] -> [[Int]]
children b = [ (q:b) | q <- [1..n],
safe q b ]

subtree :: Int -> [Int] -> [[Int]]

subtree 0 b = [b]
subtree c b =
parList :: [a] -> b -> b
concat $
parList $
map (subtree (c-1)) $
children b
parList is not built-in magic...

• It is defined using par:

parList :: [a] -> b -> b

parList [] b = b
parList (x:xs) b = par x $ parList xs b

• (full disclosure: in N-queens we need a slightly

different version in order to fully evaluate the
nested lists)
Results

• Speedup: 3.5 on 6 cores

• We can do better...
How many sparks?

SPARKS: 5151164 (5716 converted, 4846805 pruned)

• The cost of creating a spark for every tree node is

high
• sparks near the leaves are cheap
• Parallelism works better when the work units are
large (coarse-grained parallelism)
• But we don’t want to be too coarse, or there
won’t be enough grains
• Solution: parallelise down to a certain depth
Bounding the parallel depth

subtree :: Int -> [Int] -> [[Int]]

subtree 0 b = [b] change parList into
subtree c b = maybeParLIst
concat $
maybeParList c $
below the threshold,
map (subtree (c-1)) $
maybeParList is “id” (do
children b
nothing)
maybeParList c
| c < threshold = id
| otherwise = parList
Results...

• Speedup: 4.7 on 6 cores

– depth 3
– ~1000 sparks
Can this be improved?

• There is more we could do here, to optimise

both sequential and parallel performance
• but we got good results with only a little effort
Original sequential version

• However, we did have to change the original

program... trees good, lists bad:
nqueens :: Int -> [[Int]]
nqueens n = gen n
where
gen :: Int -> [[Int]]
gen 0 = [[]]
gen c = [ (q:b) | b <- gen (c-1),
q <- [1..n],
safe q b]

• c.f. Guy Steele “Organising Functional Code for

Parallel Execution”
Raising the level of abstraction

• Lowest level: par/pseq

• Next level: parList
• A general abstraction: Strategies1
A value of type Strategy a is a policy
for evaluating things of type a

parPair :: Strategy a -> Strategy b -> Strategy (a,b)

• a strategy for evaluating components of a pair in

parallel, given a Strategy for each component

1Algorithm + strategy = parallelism, Trinder et. al., JFP 8(1),1998

Define your own Strategies

• Strategies are just an abstraction, defined in

Haskell, on top of par/pseq
type Strategy a = a -> Eval a
using :: a -> Strategy a -> a

data Tree a = Leaf a | Node [Tree a]

parTree :: Int -> Strategy (Tree [Int]) A strategy that

parTree 0 tree = rdeepseq tree evaluates a tree in
parTree n (Leaf a) = return (Leaf a) parallel up to the given
parTree n (Node ts) = do depth
us <- parList (parTree (n-1)) ts
return (Node us)
Refactoring N-queens

data Tree a = Leaf a | Node [Tree a]

leaves :: Tree a -> [a]

nqueens n = leaves (subtree n [])

where
subtree :: Int -> [Int] -> Tree [Int]
subtree 0 b = Leaf b
subtree c b = Node (map (subtree (c-1)) (children b))
Refactoring N-queens

• Now we can move the parallelism to the outer

level:

nqueens n = leaves (subtree n [] `using` parTree 3)

Modular parallelism

• The description of the parallelism can be

separate from the algorithm itself
– thanks to lazy evaluation: we can build a
structured computation without evaluating it, the
strategy says how to evaluate it
– don’t clutter your code with parallelism
– (but be careful about space leaks)
Parallel Haskell, summary

• par, pseq, and Strategies let you annotate purely

functional code for parallelism
• Adding annotations does not change what the program
means
– no race conditions or deadlocks
– easy to experiment with
• ThreadScope gives visual feedback
• The overhead is minimal, but parallel programs scale
• You still have to understand how to parallelise the
algorithm!
• Complements concurrency
Take a deep breath...

• ... we’re leaving the purely functional world

and going back to threads and state
Concurrent data structures

• Concurrent programs often need shared data

structures, e.g. a database, or work queue, or
other program state
• Implementing these structures well is
extremely difficult
• So what do we do?
– let Someone Else do it (e.g. Intel TBB)
• but we might not get exactly what we want
– In Haskell: do it yourself...
Case study: Concurrent Linked Lists

newList :: IO (List a)
Creates a new (empty) list

addToTail :: List a -> a -> IO ()

Adds an element to the tail of the list

find :: Eq a => List a -> a -> IO Bool

Returns True if the list contains the given element

delete :: Eq a => List a -> a -> IO Bool

Deletes the given element from the list;

returns True if the list contained the element
Choose your weapon

CAS: atomic compare-and-swap,

accurate but difficult to use

MVar: a locked mutable variable.

Easier to use than CAS.

STM: Software Transactional

Memory. Almost impossible to
go wrong.
STM implementation

• Nodes are linked with transactional variables

data List a = Null
| Node { val :: a,
next :: TVar (List a) }

• Operations perform a transaction on the

whole list: simple and straightforward to
implement
• What about without STM, or if we want to
avoid large transactions?
What can go wrong?

1 2 3 4

thread 1: “delete 2” thread 2: “delete 3”

Fixing the race condition
Swinging the pointer will
not physically delete the
element now, it has to be
removed later

1 2d
2 3d
3 4

thread 1: “delete 2” thread 2: “delete 3”

Adding “lazy delete”

• Now we have a deleted node:

data List a = Null
| Node { val :: a,
next :: TVar (List a) }
| DelNode { next :: TVar (LIst a) }

• Traversals should drop deleted nodes that

they find.
• Transactions no longer take place on the
whole list, only pairs of nodes at a time.
We built a few implementations...

• Full STM
• Various “lazy delete” implementations:
– STM
– MVar, hand-over-hand locking
– CAS
– CAS (using STM)
– MVar (using STM)
Results
1000

100

CAS
CASusingSTM
Time(s) 10
LAZY
MLC
MLCusingSTM
1 STM

0.1
1 2 3 4 5 6 7 8
Processors
Results (scaling)
6

4
CAS
CASusingSTM
Speedup 3
LAZY
MLC
2
MLCusingSTM
STM
1

0
1 2 3 4 5 6 7 8
Procoessors
So what?

• Large STM transactions don’t scale

• The fastest implementations use CAS
• but then we found a faster implementation...
A latecomer wins the race...
1000

100

CAS
CASusingSTM
Time(s) 10 LAZY
MLC
MLCusingSTM
1 STM
???

0.1
1 2 3 4 5 6 7 8
Processors
And the winner is...

type List a = Var [a]

• Ordinary immutable lists stored in a single
mutable variable
• trivial to define the operations
• reads are fast and automatically concurrent:
– immutable data is copy-on-write
– a read grabs a snapshot
• but what about writes? Var = ???
Choose your weapon

IORef (unsynchronised mutable

variable)

MVar (locked mutable variable)

TVar (STM)
Built-in lock-free updates

• IORef provides this clever operation:

atomicModifyIORef Takes a mutable
:: IORef a variable
-> (a -> (a,b))
and a function to
-> IO b compute the new value
(a) and a result (b)

Returns the result

atomicModifyIORef r f = do
a <- readIORef r
let (new, b) = f a Lazily!
writeIORef r new
return b
Updating the list...
An unevaluated computation
• delete 2 representing the value of
IORef applying delete 2,
NB. a pure operation.

delete 2 The reason this works is

lazy evaluation

(:) (:)

1 2
Lazy immutable = parallel

• reads can happen in parallel with other

operations, automatically
• tree-shaped structures work well: operations
in branches can be computed in parallel
• lock-free: impossible to prevent other threads
from making progress
• The STM variant is composable
Ok, so why didn’t we see scaling?

• this is a shared data structure, a single point of

contention
• memory bottlenecks, cache bouncing
• possibly: interactions with generational GC
• but note that we didn’t see a slowdown either
A recipe for concurrent data structures

• Haskell has lots of libraries providing high-

performance pure data structures
• trivial to make them concurrent:

type ConcSeq a = IORef (Seq a)

type ConcTree a = IORef (Tree a)
type ConcMap k v = IORef (Map k v)
type ConcSet a = IORef (Set a)
Conclusions...

• Thinking concurrent (and parallel):

– Immutable data and pure functions
• eliminate unnecessary interactions
– Declarative programming models say less about
“how”, giving the implementation more freedom
• SQL/LINQ/PLINQ
• map/reduce
• .NET TPL: declarative parallelism in .NET
• F# async programming
• Coming soon: Data Parallel Haskell
Try it out...

• Haskell: http://www.haskell.org/
• GHC: http://www.haskell.org/ghc
• Libraries: http://hackage.haskell.org/
• News: http://www.reddit.com/r/haskell

• me: Simon Marlow <[email protected]>

Haskell Parallel Programming Guide
No ratings yet
Haskell Parallel Programming Guide
40 pages
A Tutorial On Parallel and Concurrent Programming in Haskell
No ratings yet
A Tutorial On Parallel and Concurrent Programming in Haskell
40 pages
Parallel and Concurrent Programming in Haskell - PDF
No ratings yet
Parallel and Concurrent Programming in Haskell - PDF
71 pages
10v Haskell BLDL
No ratings yet
10v Haskell BLDL
23 pages
Laziness
No ratings yet
Laziness
24 pages
Multicore Haskell Now!
100% (3)
Multicore Haskell Now!
93 pages
Strategies
No ratings yet
Strategies
48 pages
The Par Monad: Dataflow Parallelism: Stephen A. Edwards
No ratings yet
The Par Monad: Dataflow Parallelism: Stephen A. Edwards
13 pages
CSE524sp10 01
No ratings yet
CSE524sp10 01
62 pages
Haskell: Programming Language
100% (1)
Haskell: Programming Language
23 pages
Intro
No ratings yet
Intro
12 pages
Concurrency Models
No ratings yet
Concurrency Models
22 pages
Unit 3
No ratings yet
Unit 3
49 pages
Parallelism and Concurrency Guide
No ratings yet
Parallelism and Concurrency Guide
18 pages
Parallel Thinking: Guy Blelloch Carnegie Mellon University
No ratings yet
Parallel Thinking: Guy Blelloch Carnegie Mellon University
37 pages
Haskell's Evolution and Impact
100% (1)
Haskell's Evolution and Impact
60 pages
Parallel Thinking: Guy Blelloch Carnegie Mellon University
No ratings yet
Parallel Thinking: Guy Blelloch Carnegie Mellon University
41 pages
Wearing The Hair Shirt
100% (1)
Wearing The Hair Shirt
68 pages
2010 03 21 Dan - Vasicek.functional Programming Using Haskell
100% (1)
2010 03 21 Dan - Vasicek.functional Programming Using Haskell
54 pages
Data Structure Notes 1ST Sem
No ratings yet
Data Structure Notes 1ST Sem
15 pages
Unit 3 Complete APP
No ratings yet
Unit 3 Complete APP
49 pages
Parallel Algorithms: Theory and Practice: Deterministi C Parallelism
No ratings yet
Parallel Algorithms: Theory and Practice: Deterministi C Parallelism
51 pages
Data Structures: Stacks, Queues, Lists
No ratings yet
Data Structures: Stacks, Queues, Lists
6 pages
Ads Assignment No. 3
No ratings yet
Ads Assignment No. 3
17 pages
APP Unit I
No ratings yet
APP Unit I
11 pages
Data Structures (1) - 61-72
No ratings yet
Data Structures (1) - 61-72
12 pages
C# Parallel Programming Guide
No ratings yet
C# Parallel Programming Guide
59 pages
A Simple Abstraction For Complex Concurrent Indexes
No ratings yet
A Simple Abstraction For Complex Concurrent Indexes
23 pages
Parallel Algorithms: Theory and Practice
No ratings yet
Parallel Algorithms: Theory and Practice
44 pages
Parallel Paradigms
No ratings yet
Parallel Paradigms
16 pages
PCP 2025 2 ProgrammingModel
No ratings yet
PCP 2025 2 ProgrammingModel
23 pages
Study Notes On Graham Hutton's Haskell Book (2nd Edition)
No ratings yet
Study Notes On Graham Hutton's Haskell Book (2nd Edition)
24 pages
Unit 2 Data Structure
No ratings yet
Unit 2 Data Structure
13 pages
Decap538 Algorithm Design and Analysis3
No ratings yet
Decap538 Algorithm Design and Analysis3
6 pages
Unit-3 Data Structures
No ratings yet
Unit-3 Data Structures
19 pages
Towards A Library of Parallel Graph Algorithms in Java
No ratings yet
Towards A Library of Parallel Graph Algorithms in Java
8 pages
Ebook Fundations of Paralllel Programming
No ratings yet
Ebook Fundations of Paralllel Programming
109 pages
Linq Plinq
No ratings yet
Linq Plinq
24 pages
BDS Session 6
No ratings yet
BDS Session 6
53 pages
Don't Repeat Yourself: 10 Principle-Based Acronyms Clear, Real-World Examples When and How
No ratings yet
Don't Repeat Yourself: 10 Principle-Based Acronyms Clear, Real-World Examples When and How
25 pages
3120 - Merged
0% (1)
3120 - Merged
134 pages
Linked List Ukl 1-2
No ratings yet
Linked List Ukl 1-2
39 pages
Writing Fast Haskell
100% (1)
Writing Fast Haskell
49 pages
2024-10-19 Ads With Java - Day 2
No ratings yet
2024-10-19 Ads With Java - Day 2
34 pages
Comp 352 Study Guide
No ratings yet
Comp 352 Study Guide
16 pages
Lecture Parallelism DC PDF
No ratings yet
Lecture Parallelism DC PDF
7 pages
Lectura Técnica 2 Erlang
No ratings yet
Lectura Técnica 2 Erlang
9 pages
4 Stacks and Queues
No ratings yet
4 Stacks and Queues
36 pages
Java Multithreading Basics
No ratings yet
Java Multithreading Basics
90 pages
Week 13
No ratings yet
Week 13
4 pages
10 1002@1096-91282000122512@151515@@aid-Cpe5673 0 Co2-0
No ratings yet
10 1002@1096-91282000122512@151515@@aid-Cpe5673 0 Co2-0
2 pages
CG Mini Project Report Kyashawanth
100% (1)
CG Mini Project Report Kyashawanth
33 pages
2009 Runtime Support For Multicore Haskell
No ratings yet
2009 Runtime Support For Multicore Haskell
13 pages
7-Tree Sum Parallel Algorithm & Applications
No ratings yet
7-Tree Sum Parallel Algorithm & Applications
23 pages
Parallel Computation Models Explained
No ratings yet
Parallel Computation Models Explained
3 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
Building A Business With Haskell: Case Studies: Cryptol, HaLVM and Copilot
100% (3)
Building A Business With Haskell: Case Studies: Cryptol, HaLVM and Copilot
32 pages
Galois Tech Talk: A Scalable Io Manager For GHC
No ratings yet
Galois Tech Talk: A Scalable Io Manager For GHC
22 pages
Engineering Large Projects in A Functional Language
No ratings yet
Engineering Large Projects in A Functional Language
49 pages
Evaluation Strategies and Synchronization: Things To Watch For
No ratings yet
Evaluation Strategies and Synchronization: Things To Watch For
13 pages
The Design and Implementation of Xmonad
100% (1)
The Design and Implementation of Xmonad
38 pages
Loop Fusion in Haskell
No ratings yet
Loop Fusion in Haskell
90 pages
Haskell Arrays Accelerated With GPUs
100% (1)
Haskell Arrays Accelerated With GPUs
47 pages
A Wander Through GHC's New IO Library
No ratings yet
A Wander Through GHC's New IO Library
22 pages
Stream Fusion For Haskell Arrays
100% (1)
Stream Fusion For Haskell Arrays
31 pages
Haskell Supercompilation Optimization
No ratings yet
Haskell Supercompilation Optimization
26 pages
Haskell for Large-Scale Projects
100% (3)
Haskell for Large-Scale Projects
46 pages
AdventureWorks Entity Relationship Diagram
No ratings yet
AdventureWorks Entity Relationship Diagram
1 page
Distributed Systems: Mutual Exclusion
No ratings yet
Distributed Systems: Mutual Exclusion
2 pages
Python Material 2024 TOPIC GUI PART 3
No ratings yet
Python Material 2024 TOPIC GUI PART 3
22 pages
II Puc Computers Science Mock Paper 2
No ratings yet
II Puc Computers Science Mock Paper 2
3 pages
IT Developer and Admin Job Openings
No ratings yet
IT Developer and Admin Job Openings
3 pages
Class 8 Computer Question Paper Updated
No ratings yet
Class 8 Computer Question Paper Updated
3 pages
Python Basics Study Guide
No ratings yet
Python Basics Study Guide
8 pages
The University of Lahore: Course Outline
No ratings yet
The University of Lahore: Course Outline
6 pages
E-Commerce Assignment
No ratings yet
E-Commerce Assignment
4 pages
Business Impact of Code Quality
No ratings yet
Business Impact of Code Quality
14 pages
SPM Lecture Notes
No ratings yet
SPM Lecture Notes
13 pages
Stack Practice Programs
No ratings yet
Stack Practice Programs
6 pages
Process Management in OS
No ratings yet
Process Management in OS
3 pages
Crack SQL Interview
No ratings yet
Crack SQL Interview
3 pages
SV Coverage Assertions 1738667250
No ratings yet
SV Coverage Assertions 1738667250
18 pages
Computational Lab in Physics: Monte Carlo Integration
No ratings yet
Computational Lab in Physics: Monte Carlo Integration
13 pages
12th Computer Science Practical Question Papers 1&2
100% (1)
12th Computer Science Practical Question Papers 1&2
35 pages
Head First Design Patterns: Building Extensible and Maintainable Object-Oriented Software 2nd Edition Eric Freeman PDF Download
100% (1)
Head First Design Patterns: Building Extensible and Maintainable Object-Oriented Software 2nd Edition Eric Freeman PDF Download
51 pages
C++ Game Development Project Guide
No ratings yet
C++ Game Development Project Guide
3 pages
Ooptj r23 Unit 2
No ratings yet
Ooptj r23 Unit 2
20 pages
Computer Science Class 12, Chap3 Page1
No ratings yet
Computer Science Class 12, Chap3 Page1
1 page
Programming in Z80 Assembly Language
100% (1)
Programming in Z80 Assembly Language
129 pages
Disclaimer:: TCS Ninja Role and Packages
No ratings yet
Disclaimer:: TCS Ninja Role and Packages
29 pages
LAB#9 Vivado I
No ratings yet
LAB#9 Vivado I
15 pages
How Assistants Work - OpenAI API
No ratings yet
How Assistants Work - OpenAI API
9 pages
Unit 4-JavaBean Nodejs mongoDB
No ratings yet
Unit 4-JavaBean Nodejs mongoDB
98 pages
1-Block-Based Coding System - Google Search
No ratings yet
1-Block-Based Coding System - Google Search
3 pages
OOP Project Report: On CGPA Calculator
No ratings yet
OOP Project Report: On CGPA Calculator
17 pages
16-Putting It All Together
No ratings yet
16-Putting It All Together
122 pages
IP Practical File 2024-25
No ratings yet
IP Practical File 2024-25
20 pages

Multicore Programming in Haskell

Uploaded by

Multicore Programming in Haskell

Uploaded by

Multicore programming in

server :: Socket -> IO ()

create a new thread the client/server

• Threads let us implement individual

• for hiding latency

• Purely functional by default

• Parallel programming in Haskell

par :: a -> b -> b

Evaluate the first return the second

par :: a -> b -> b

Evaluate the first Return the second

The spark is picked up

Thread 1 Thread 3 Thread 2

CPU 0 CPU 1 CPU 2

• par is very cheap: a write into a circular buffer

• How can we parallelise this?

nqueens :: Int -> [[Int]]

subtree :: Int -> [Int] -> [[Int]]

• It is defined using par:

parList :: [a] -> b -> b

• (full disclosure: in N-queens we need a slightly

• Speedup: 3.5 on 6 cores

SPARKS: 5151164 (5716 converted, 4846805 pruned)

• The cost of creating a spark for every tree node is

subtree :: Int -> [Int] -> [[Int]]

• Speedup: 4.7 on 6 cores

• There is more we could do here, to optimise

• However, we did have to change the original

• c.f. Guy Steele “Organising Functional Code for

• Lowest level: par/pseq

parPair :: Strategy a -> Strategy b -> Strategy (a,b)

• a strategy for evaluating components of a pair in

1Algorithm + strategy = parallelism, Trinder et. al., JFP 8(1),1998

• Strategies are just an abstraction, defined in

data Tree a = Leaf a | Node [Tree a]

parTree :: Int -> Strategy (Tree [Int]) A strategy that

data Tree a = Leaf a | Node [Tree a]

leaves :: Tree a -> [a]

nqueens n = leaves (subtree n [])

• Now we can move the parallelism to the outer

nqueens n = leaves (subtree n [] `using` parTree 3)

• The description of the parallelism can be

• par, pseq, and Strategies let you annotate purely

• ... we’re leaving the purely functional world

• Concurrent programs often need shared data

addToTail :: List a -> a -> IO ()

find :: Eq a => List a -> a -> IO Bool

Returns True if the list contains the given element

delete :: Eq a => List a -> a -> IO Bool

Deletes the given element from the list;

CAS: atomic compare-and-swap,

MVar: a locked mutable variable.

STM: Software Transactional

• Nodes are linked with transactional variables

• Operations perform a transaction on the

thread 1: “delete 2” thread 2: “delete 3”

thread 1: “delete 2” thread 2: “delete 3”

• Now we have a deleted node:

• Traversals should drop deleted nodes that

• Large STM transactions don’t scale

type List a = Var [a]

IORef (unsynchronised mutable

MVar (locked mutable variable)

• IORef provides this clever operation:

Returns the result

delete 2 The reason this works is

• reads can happen in parallel with other

• this is a shared data structure, a single point of

• Haskell has lots of libraries providing high-

type ConcSeq a = IORef (Seq a)

• Thinking concurrent (and parallel):

• me: Simon Marlow <[email protected]>

You might also like