0% found this document useful (0 votes)
58 views49 pages

Writing Fast Haskell

The document discusses techniques for writing fast Haskell code, including: - Understanding GHC's compilation pipeline and Core intermediate representation - Using primitive, unlifted, and boxed types for performance - Forcing strict evaluation using seq, bang patterns, and strictness annotations - Specializing and inlining functions - Avoiding space leaks by using constant-size accumulators - Controlling memory layout with UNPACK pragmas - Transforming functions to continuation-passing style

Uploaded by

John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views49 pages

Writing Fast Haskell

The document discusses techniques for writing fast Haskell code, including: - Understanding GHC's compilation pipeline and Core intermediate representation - Using primitive, unlifted, and boxed types for performance - Forcing strict evaluation using seq, bang patterns, and strictness annotations - Specializing and inlining functions - Avoiding space leaks by using constant-size accumulators - Controlling memory layout with UNPACK pragmas - Transforming functions to continuation-passing style

Uploaded by

John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Writing Fast Haskell

Elegance Is Not an Excuse for Bad Performance

Moritz Kiefer (@cocreature)


August 22, 2018
Introduction

• Haskellers often talk about elegant code

1
Introduction

• Haskellers often talk about elegant code


• But elegance is not an excuse for bad performance!

1
Introduction

• Haskellers often talk about elegant code


• But elegance is not an excuse for bad performance!
• Writing fast Haskell requires some understanding of GHC’s
internals

1
Introduction

• Haskellers often talk about elegant code


• But elegance is not an excuse for bad performance!
• Writing fast Haskell requires some understanding of GHC’s
internals
• GHC provides a surprising number of tools to influence
performance

1
Goals for Today

1. Learn to reason about performance


2. Look under the hood of GHC (specifically Core)
3. Learn about some rules of thumb for writing fast Haskell
4. Learn about primitives and libraries useful for writing fast
Haskell

2
Benchmarking/Profiling Disclaimer

• Benchmark before you optimize


• GHC supports options for time and space profiling
• Profiling can break optimizations
• Enable profiling selectively

3
Primitive, Unlifted and Boxed Types

Primitive Types
Correspond to “raw machine types”
E.g. Int#, Double#

4
Primitive, Unlifted and Boxed Types

Primitive Types
Correspond to “raw machine types”
E.g. Int#, Double#

Boxed Types
Represented by a pointer to a heap object
E.g. all user-defined types, Int

4
Primitive, Unlifted and Boxed Types

Primitive Types
Correspond to “raw machine types”
E.g. Int#, Double#

Boxed Types
Represented by a pointer to a heap object
E.g. all user-defined types, Int

Unlifted Types
Cannot be bottom
E.g. all primitive types but also Array# (which is not
primitive)

4
Definition of the Int type

data Int = I# Int#

Int
I# Int#

5
GHC Compilation Pipeline

Haskell

Core

STG

C--

Assembly LLVM IR

6
GHC Compilation Pipeline

Haskell Renaming, type checking, desugaring

Core

STG

C--

Assembly LLVM IR

6
GHC Compilation Pipeline

Haskell Renaming, type checking, desugaring

Core Simplifier

STG

C--

Assembly LLVM IR

6
GHC Compilation Pipeline

Haskell Renaming, type checking, desugaring

Core Simplifier

STG Canonicalized core without types

C--

Assembly LLVM IR

6
Core’s Expr Type

data Expr b
= Var Id
| Lit Literal
| App (Expr b) (Arg b)
| Lam b (Expr b)
| Let (Bind b) (Expr b)
| Case (Expr b) b Type [Alt b]
| Cast (Expr b) Coercion
| Tick (Tickish Id) (Expr b)
| Type Type
| Coercion Coercion
deriving Data

7
Viewing Core

• -ddump-simpl or -ddump-prep
• Suppress info that you don’t care about
• -dsuppress-idinfo
• -dsuppress-ticks
• -dsuppress-module-prefixes
• -dsuppress-all
• GHC plugin that outputs core as HTML

8
Mental Model for Core

let
Allocates a thunk on the heap

case
Forces evaluation to WHNF

9
Naive Sum

sum :: [Int] -> Int


sum [] = 0
sum (x : xs) = x + sum xs

10
Tail-Recursive Sum

sum :: [Int] -> Int


sum = go 0
where
go acc [] = acc
go acc (x : xs) = go (x + acc) xs

11
Force the Accumulator

sum :: [Int] -> Int


sum = go 0
where
go acc [] = acc
go acc (x : xs) =
let acc' = x + acc
in acc' `seq` go acc' xs

12
BangPatterns

{-# LANGUAGE BangPatterns #-}


sum :: [Int] -> Int
sum = go 0
where
go acc [] = acc
go acc (x : xs) =
let !acc' = x + acc
in go acc' xs

13
WHNF

• seq only evaluates to WHNF


• Be careful with tuples!
(x,y) `seq` … will not evaluate x and y
• Use the deepseq lib for evaluating to NF

14
Strictness Annotations in Data Types

data Point = Point !Int !Int

Whenever you evaluate Point to WHNF, you also evaluate the


two fields to WHNF.
Often easier to use than seq/BangPatterns

15
Avoiding Space Leaks

Rule of Thumb
Constant-size (e.g. Int) accumulators are often problematic

Detecting Space Leaks

• Limit the stack size +RTS -K${n}K


• Get a stacktrace with +RTS -xc -K${n}K

16
Specialization and Inlining

Specialization

• Specialize type parameters


• Remove type class dictionaries

Inlining

• Inline definition at call site

17
Cross-Module Specialization and Inlining

• Specialization/Inlining only possible if definition


(=unfolding) is available
• Unfoldings of small definitions are automatically exposed
• {-# INLINABLE f #-} forces GHC to expose f’s
unfolding
• You might also want to expose unfoldings of definitions
used by f

18
Specialization

• GHC will automatically try to specialize definitions at


use-sites
• Create specializations using
{-# SPECIALIZE f :: Int -> Int #-}
• Also creates specializations of functions called by f

19
Inlining

• {-# INLINE f #-} makes GHC very eager to inline f


• Use cautiously!
• Can blow up compile times significantly
• Can increase code size without bringing benefits

20
Inlining

• {-# INLINE f #-} makes GHC very eager to inline f


• Use cautiously!
• Can blow up compile times significantly
• Can increase code size without bringing benefits
• Note: {-# INLINABLE f #-} does not make GHC more
eager to inline f

20
Inlining

The following two definitions are equivalent.

f a b = …

f = \a b …

21
Inlining

Or are they?

f a b = …

f = \a b …

21
Call Arity

GHC will only inline


fully saturated function applications!

21
Controlling Memory Layout

data Point =
Point Int
Int

Point

Point Int Int

Int Int
I# Int# I# Int#

22
Controlling Memory Layout

data Point =
Point {-# UNPACK #-} !Int
{-# UNPACK #-} !Int

Point

Point Int# Int#

23
Automatic Unpacking

• GHC is quite good at automatic unpacking


• But only if it can detect that an argument is strict
• Sometimes you need to help it

f :: Vector Int -> …


f xs = …
where n = Vector.length xs

24
Automatic Unpacking

• GHC is quite good at automatic unpacking


• But only if it can detect that an argument is strict
• Sometimes you need to help it

f :: Vector Int -> …


f xs = …
where !n = Vector.length xs

24
Continuation Passing Style

main :: IO ()
main =
case loop2 100 (10, 10) of
(x, y) -> print (x - y)

loop2 :: Int -> (Int,Int) -> (Int,Int)


loop2 n (x, y)
| n > 0 = loop2 (n - 1) (x + 1, y - 1)
| otherwise = (x, y)

25
Continuation Passing Style

Convert

f :: a -> b

into

f :: a -> (b -> r) -> r

Can avoid allocations and unnecessary case distinctions

26
Continuation Passing Style

main :: IO ()
main =
loop2 100 (10, 10) $ \(x, y) -> print (x - y)

loop2 :: Int -> (Int,Int) -> ((Int,Int) -> r) -> r


loop2 n (x, y) cont
| n > 0 = loop2 (n - 1) (x + 1,y - 1) cont
| otherwise = cont (x, y)

27
Writing Your Own Optimizations: Rewrite Rules

map Fusion

map f . map g = map (f . g)

28
Writing Your Own Optimizations: Rewrite Rules

map Fusion

map f . map g = map (f . g)

build/foldr Fusion

build
:: (forall b. (a -> b -> b) -> b -> b)
-> [a]
foldr :: (a -> b -> b) -> b -> [a] -> b

foldr f a (build g) = g f a

28
Writing your Own Optimizations: Rewrite Rules

Example

{-# RULES
"map/map"
forall f g xs. map f (map g xs) =
map (f.g) xs
#-}

• GHC does not check correctness of rules


• GHC does not check termination of rules
• Use phases to control interaction of rules and inlining

29
Array Primitives in GHC

Array#

• Array of boxed values


• Card table to avoid having to scan unmodified entries in
GC

SmallArray#

• Array of boxed values


• No card table

ByteArray#

• Region of raw memory


• Pinned and unpinned

30
High-Level Array Libraries

• primitive provides PrimArray wrapper around


ByteArray#
• vector provides boxed, unboxed and Storable vectors
• Fusion
• Slicing

31
Basic Data Structures

• containers is mostly pretty good!


• Use the specialized data structures for Int: IntSet and
IntMap
• unordered-containers has a fast, persistent HashMap
• Mutable hashtables from the hashtables package are
often slower

32
Conclusion

• GHC is impressively good at optimizing high-level code


• Reasoning about performance isn’t trivial but definitely
possible
• GHC gives us the tools to control specific aspects of our
programs

33
Conclusion

• GHC is impressively good at optimizing high-level code


• Reasoning about performance isn’t trivial but definitely
possible
• GHC gives us the tools to control specific aspects of our
programs
• If all else fails, GHC has a great C FFI

33
More Information

• The Spineless Tagless G-machine


• Detecting Space Leaks
• Inlining and Specialisation
• GHC User’s Guide
• The GHC Commentary

34

You might also like