0% found this document useful (0 votes)
8 views

Building A Highly Concurrent Cache in Go

The document discusses building a highly concurrent cache in Go. It describes caching post data for ranking by storing frequently accessed data in an in-memory cache for faster retrieval compared to querying external services or databases. It then shows an initial implementation of a thread-safe cache using mutexes to allow concurrent reads and writes.

Uploaded by

snakeza
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Building A Highly Concurrent Cache in Go

The document discusses building a highly concurrent cache in Go. It describes caching post data for ranking by storing frequently accessed data in an in-memory cache for faster retrieval compared to querying external services or databases. It then shows an initial implementation of a thread-safe cache using mutexes to allow concurrent reads and writes.

Uploaded by

snakeza
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 201

Building a Highly

Concurrent Cache in Go
A Hitchhiker’s Guide
Konrad Reiche

In its full generality, multithreading is
an incredibly complex and error-prone
technique, not to be recommended in
any but the smallest programs. “
― C. A. R. Hoare:
Communicating Sequential Processes
3 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
4 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
5 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
6 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
7 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
8 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
9 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
10 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
11 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
12 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
13 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
14 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
15 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
16 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
17 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
18 Building a Highly Concurrent Cache in Go

Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
[7932676.760082] Out of memory: kill process 23936
[7932676.761528] Killed process 23936 (go)
“ Don’t panic.

20 Building a Highly Concurrent Cache in Go
“ Don’t panic.
― Effective Go

21 Building a Highly Concurrent Cache in Go
“ Don’t panic.
― Effective Go

― Douglas Adams:
The Hitchhiker's
Guide to the Galaxy

22 Building a Highly Concurrent Cache in Go


Cache
What is a Cache?

Software component that stores data so that future


requests for the data can be served faster.

24 Building a Highly Concurrent Cache in Go


What is a Cache?

Software component that stores data so that future


requests for the data can be served faster.

Remember the result of an expensive operation,


to speed up reads.

25 Building a Highly Concurrent Cache in Go


What is a Cache?

Software component that stores data so that future


requests for the data can be served faster.

Remember the result of an expensive operation,


to speed up reads.

data, ok := cache.Get()
if !ok {
data := doSomething()
cache.Set(data)
}

26 Building a Highly Concurrent Cache in Go


Me

27
ca. 2019
My Manager

28
ca. 2019
My Manager
and Me

Where is the cache Lebowski Reiche?


29
Caching Post Data for Ranking

Kubernetes Pods for


Ranking Service

30 Building a Highly Concurrent Cache in Go


Caching Post Data for Ranking Thing Service

Get data for


filtering posts

Kubernetes Pods for


Ranking Service

31 Building a Highly Concurrent Cache in Go


Caching Post Data for Ranking Thing Service

Get data for


filtering posts

Redis Cache
Cluster Kubernetes Pods for
Ranking Service

32 Building a Highly Concurrent Cache in Go


Caching Post Data for Ranking Thing Service

Get data for


filtering posts

Update on Miss

Cache Lookups

Redis Cache
Cluster Kubernetes Pods for
Ranking Service

33 Building a Highly Concurrent Cache in Go


Caching Post Data for Ranking Thing Service

Get data for


filtering posts

Update on Miss

Cache Lookups
5-20m op/sec

Redis Cache
Cluster Kubernetes Pods for
Ranking Service

34 Building a Highly Concurrent Cache in Go


Caching Post Data for Ranking Thing Service

Get data for


filtering posts

Update on Miss

Cache Lookups
5-20m op/sec

Redis Cache
Cluster Kubernetes Pods for
Local In-Memory Cache with Ranking Service
~90k op/sec per pod

35 Building a Highly Concurrent Cache in Go


type cache struct {

data map[string]string
}

36 Building a Highly Concurrent Cache in Go


type cache[K comparable, V any] struct {

data map[K]V
}

37 Building a Highly Concurrent Cache in Go


type cache struct {

data map[string]string
}

We can introduce generics once


needed to keep it simple.

38 Building a Highly Concurrent Cache in Go


type cache struct {

data map[string]string
}

func (c *cache) Set(key, value string) {


c.data[key] = value
}

func (c *cache) Get(key string) (string, bool) {


value, ok := c.data[key]
return value, ok
}

39 Building a Highly Concurrent Cache in Go


type cache struct {
mu sync.Mutex
data map[string]string
}

func (c *cache) Set(key, value string) {


c.data[key] = value
}

func (c *cache) Get(key string) (string, bool) {


value, ok := c.data[key]
return value, ok
}

40 Building a Highly Concurrent Cache in Go


type cache struct {
mu sync.Mutex
data map[string]string
}

func (c *cache) Set(key, value string) {

c.data[key] = value
}

func (c *cache) Get(key string) (string, bool) {

value, ok := c.data[key]
return value, ok
}

41 Building a Highly Concurrent Cache in Go


type cache struct {
mu sync.Mutex
data map[string]string
}

func (c *cache) Set(key, value string) {


c.mu.Lock()
defer c.mu.Unlock()
c.data[key] = value
}

func (c *cache) Get(key string) (string, bool) {


c.mu.Lock()
defer c.mu.Unlock()
value, ok := c.data[key]
return value, ok
}

42 Building a Highly Concurrent Cache in Go


type cache struct {
mu sync.RWMutex
data map[string]string
}

func (c *cache) Set(key, value string) {


c.mu.Lock()
defer c.mu.Unlock()
c.data[key] = value
}

func (c *cache) Get(key string) (string, bool) {


c.mu.RLock()
defer c.mu.RUnlock()
value, ok := c.data[key]
return value, ok
}

43 Building a Highly Concurrent Cache in Go


type cache struct {
mu sync.RWMutex
data map[string]string
}

func (c *cache) Set(key, value string) {


c.mu.Lock()
defer c.mu.Unlock()
c.data[key] = value
}

func (c *cache) Get(key string) (string, bool) {


c.mu.RLock()
defer c.mu.RUnlock()
value, ok := c.data[key] Don’t generalize the cache from the
return value, ok
} start; pick an API that maximizes
your usage pattern.

44 Building a Highly Concurrent Cache in Go


type cache struct {
mu sync.RWMutex
data map[string]string
}

func (c *cache) Set(items map[string]string) {


c.mu.Lock()
defer c.mu.Unlock()
for key, value := range items {
c.data[key] = value
}
}

func (c *cache) Get(keys []string) map[string]string {


result := make(map[string]string)
c.mu.RLock()
defer c.mu.RUnlock()
for _, key := range keys {
if value, ok := c.data[key]; ok {
result[key] = value
}
}
return result
}
45 Building a Highly Concurrent Cache in Go
[7932676.760082] Out of memory: kill process 23936
[7932676.761528] Killed process 23936 (go)
Cache
Replacement
Policies
Cache Replacement Policies
Cache
Once the cache exceeds its maximum capacity,
which data should be evicted to make space for
new data? Entry1

Entry2

Entry3


Entryn

48 Building a Highly Concurrent Cache in Go


Cache Replacement Policies
Cache
Once the cache exceeds its maximum capacity,
which data should be evicted to make space for
new data? Entry1

Bélády's Optimal Replacement Algorithm Entry2

Remove the entry whose next use will occur


farthest in the future. Entry3


Entryn

49 Building a Highly Concurrent Cache in Go


Cache Replacement Policies
Cache
Once the cache exceeds its maximum capacity,
which data should be evicted to make space for
new data? Entry1

Bélády's Optimal Replacement Algorithm Entry2

Remove the entry whose next use will occur


farthest in the future. Entry3


Because we cannot predict the future, we can Entryn
only try to approximate this behavior.

50 Building a Highly Concurrent Cache in Go


Taxonomy of Cache Replacement Policies
Coarse-Grained Fine-Grained
Policies Policies

Recency Frequency Hybrid Economic Value Reuse Distance Classification

Longer History
More Access Patterns

51 Building a Highly Concurrent Cache in Go

[1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
Taxonomy of Cache Replacement Policies
Coarse-Grained Fine-Grained
Policies Policies

Recency Frequency Hybrid Economic Value Reuse Distance Classification

LRU LFU
Longer History
More Access Patterns

52 Building a Highly Concurrent Cache in Go

[1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
LFU

Least Frequently Used (LFU)


Favor entries that are used frequently.

53 Building a Highly Concurrent Cache in Go


LFU type cache struct {
size int

Least Frequently Used (LFU) mu sync.RWMutex


data map[string]string
Favor entries that are used frequently.
}

54 Building a Highly Concurrent Cache in Go


LFU type cache struct {
size int

Least Frequently Used (LFU) mu sync.Mutex


data map[string]*item
Favor entries that are used frequently. heap *MinHeap
}

type item struct {


key string
value string
frequency int
index int
}

55 Building a Highly Concurrent Cache in Go


LFU func (c *cache) Set(items map[string]string) {
c.mu.Lock()
defer c.mu.Unlock()
Least Frequently Used (LFU)
for key, value := range items {
Favor entries that are used frequently. item := &item{
key: key,
value: value,
frequency: 1,
}
c.data[key] = item
heap.Push(&c.heap, item)
}

for len(c.data) > c.size {


item := heap.Pop(&c.heap).(*Item)
delete(c.data, item.key)
}
}

56 Building a Highly Concurrent Cache in Go


LFU func (c *cache) Set(items map[string]string) {
c.mu.Lock()
defer c.mu.Unlock()
Least Frequently Used (LFU)
for key, value := range items {
Favor entries that are used frequently. item := &item{
key: key,
value: value,
a frequency: 1,
frequency: 1
}
c.data[key] = item
heap.Push(&c.heap, item)
}

for len(c.data) > c.size {


item := heap.Pop(&c.heap).(*Item)
delete(c.data, item.key)
}
}

57 Building a Highly Concurrent Cache in Go


LFU func (c *cache) Set(items map[string]string) {
c.mu.Lock()
defer c.mu.Unlock()
Least Frequently Used (LFU)
for key, value := range items {
Favor entries that are used frequently. item := &item{
key: key,
value: value,
a frequency: 1,
frequency: 1
}
c.data[key] = item
heap.Push(&c.heap, item)
b }
frequency: 1
for len(c.data) > c.size {
item := heap.Pop(&c.heap).(*Item)
delete(c.data, item.key)
}
}

58 Building a Highly Concurrent Cache in Go


LFU func (c *cache) Set(items map[string]string) {
c.mu.Lock()
defer c.mu.Unlock()
Least Frequently Used (LFU)
for key, value := range items {
Favor entries that are used frequently. item := &item{
key: key,
value: value,
a frequency: 1,
frequency: 1
}
c.data[key] = item
heap.Push(&c.heap, item)
b c }
frequency: 1 frequency: 1
for len(c.data) > c.size {
item := heap.Pop(&c.heap).(*Item)
delete(c.data, item.key)
}
}

59 Building a Highly Concurrent Cache in Go


LFU func (c *cache) Get(keys []string) (
map[string]string,
[]string,
Least Frequently Used (LFU) ) {
result := make(map[string]string)
Favor entries that are used frequently. missing := make([]string, 0)

c.mu.Lock()
a defer c.mu.Unlock()
frequency: 1

for _, key := range keys {


if item, ok := c.data[key]; ok {
b c result[key] = item.value
frequency: 1 frequency: 1 frequency := item.frequency+1
c.heap.update(item, frequency)
} else {
missing = append(missing, key)
}
}
return result, missing
}
60 Building a Highly Concurrent Cache in Go
LFU func (c *cache) Get(keys []string) (
map[string]string,
[]string,
Least Frequently Used (LFU) ) {
result := make(map[string]string)
Favor entries that are used frequently. missing := make([]string, 0)

c.mu.Lock()
a defer c.mu.Unlock()
frequency: 1

for _, key := range keys {


if item, ok := c.data[key]; ok {
b c result[key] = item.value
frequency: 1 frequency: 1 frequency := item.frequency+1
c.heap.update(item, frequency)
} else {
missing = append(missing, key)
cache.Get([]string{"a", "b"}) }
}
return result, missing
}
61 Building a Highly Concurrent Cache in Go
LFU func (c *cache) Get(keys []string) (
map[string]string,
[]string,
Least Frequently Used (LFU) ) {
result := make(map[string]string)
Favor entries that are used frequently. missing := make([]string, 0)

c.mu.Lock()
c defer c.mu.Unlock()
frequency: 1

for _, key := range keys {


if item, ok := c.data[key]; ok {
a b result[key] = item.value
frequency: 2 frequency: 2 frequency := item.frequency+1
c.heap.update(item, frequency)
} else {
missing = append(missing, key)
}
}
return result, missing
}
62 Building a Highly Concurrent Cache in Go
LFU func (c *cache) Set(items map[string]string) {
c.mu.Lock()
defer c.mu.Unlock()
Least Frequently Used (LFU)
for key, value := range items {
Favor entries that are used frequently. item := &item{
key: key,
value: value,
c frequency: 1,
frequency: 1
}
c.data[key] = item
heap.Push(&c.heap, item)
a b }
frequency: 2 frequency: 2
for len(c.data) > c.size {
item := heap.Pop(&c.heap).(*Item)
delete(c.data, item.key)
cache.Set(map[string]string{ }
"d": "⌯Go", }
})
63 Building a Highly Concurrent Cache in Go
LFU func (c *cache) Set(items map[string]string) {
c.mu.Lock()
defer c.mu.Unlock()
Least Frequently Used (LFU)
for key, value := range items {
Favor entries that are used frequently. item := &item{
key: key,
value: value,
c frequency: 1,
frequency: 1
}
c.data[key] = item
heap.Push(&c.heap, item)
d b }
frequency: 1 frequency: 2
for len(c.data) > c.size {
item := heap.Pop(&c.heap).(*Item)
a delete(c.data, item.key)
frequency: 2 }
}

64 Building a Highly Concurrent Cache in Go


LFU func (c *cache) Set(items map[string]string) {
c.mu.Lock()
defer c.mu.Unlock()
Least Frequently Used (LFU)
for key, value := range items {
Favor entries that are used frequently. item := &item{
key: key,
value: value,
c frequency: 1,
frequency: 1
}
c.data[key] = item
heap.Push(&c.heap, item)
d b }
frequency: 1 frequency: 2
for len(c.data) > c.size {
item := heap.Pop(&c.heap).(*Item)
a delete(c.data, item.key)
frequency: 2 }
}

65 Building a Highly Concurrent Cache in Go


LFU func (c *cache) Set(items map[string]string) {
c.mu.Lock()
defer c.mu.Unlock()
Least Frequently Used (LFU)
for key, value := range items {
Favor entries that are used frequently. item := &item{
key: key,
value: value,
d frequency: 1,
frequency: 1
}
c.data[key] = item
heap.Push(&c.heap, item)
a b }
frequency: 2 frequency: 2
for len(c.data) > c.size {
item := heap.Pop(&c.heap).(*Item)
delete(c.data, item.key)
}
}

66 Building a Highly Concurrent Cache in Go


Taxonomy of Cache Replacement Policies
Coarse-Grained Fine-Grained
Policies Policies

Recency Frequency Hybrid Economic Value Reuse Distance Classification

LRU LFU
Longer History
More Access Patterns

67 Building a Highly Concurrent Cache in Go

[1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
Taxonomy of Cache Replacement Policies
Coarse-Grained Fine-Grained
Policies Policies

Recency Frequency Hybrid Economic Value Reuse Distance Classification

LRU LFU ARC EVA Timekeeping DBCP


EELRU FBR LRFU AIP EAF
SegLRU 2Q DIP ETA SDBP
LIP DRRIP Leeway SHiP
SRRIP Hawkeye

68 Building a Highly Concurrent Cache in Go

[1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
Caching Post Data for Ranking Thing Service

Get data for


filtering posts

Update on Miss

Cache Lookups
5-20m op/sec

Redis Cache
Cluster Kubernetes Pods for
Local In-Memory Cache with Ranking Service
~90k op/sec per pod

69 Building a Highly Concurrent Cache in Go


Benchmarks
Benchmarks

Functions of the form

func BenchmarkXxx(*testing.B)

are considered benchmarks, and are executed by the go test command when its
-bench flag is provided.

B is a type passed to Benchmark functions to manage benchmark timing and to


specify the number of iterations to run.
Benchmarks

● Before improving the performance of code, we should measure its current


performance

● Create a stable environment


○ Idle machine
○ No shared hardware
○ Don’t browse the web
○ Power saving, thermal scaling

● The testing package has built-in support for writing benchmarks


Benchmarks

func BenchmarkGet(b *testing.B) {

for i := 0; i < b.N; i++ {

}
}
Benchmarks

func BenchmarkGet(b *testing.B) {


cache := NewLFUCache(100)

keys := make([]string, 100)


items := make(map[string]string)
for i := 0; i < 100; i++ {
kv := fmt.Sprint(i)
keys[i] = kv
items[kv] = kv
}

for i := 0; i < b.N; i++ {

}
}
Benchmarks

func BenchmarkGet(b *testing.B) {


cache := NewLFUCache(100)

keys := make([]string, 100)


items := make(map[string]string)
for i := 0; i < 100; i++ {
kv := fmt.Sprint(i)
keys[i] = kv
items[kv] = kv
}
cache.Set(items)

b.ResetTimer()
for i := 0; i < b.N; i++ {

}
}
Benchmarks

func BenchmarkGet(b *testing.B) {


cache := NewLFUCache(100)

keys := make([]string, 100)


items := make(map[string]string)
for i := 0; i < 100; i++ {
kv := fmt.Sprint(i)
keys[i] = kv
items[kv] = kv
}
cache.Set(items)

b.ResetTimer()
for i := 0; i < b.N; i++ {
cache.Get(keys)
}
}
Benchmarks

go test -run=^$ -bench=BenchmarkGet


Benchmarks

go test -run=^$ -bench=BenchmarkGet -count=5


Benchmarks

go test -run=^$ -bench=BenchmarkGet -count=5


goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
BenchmarkGet-16 117642 9994 ns/op
BenchmarkGet-16 116402 10018 ns/op
BenchmarkGet-16 121834 9817 ns/op
BenchmarkGet-16 123241 9942 ns/op
BenchmarkGet-16 109621 10022 ns/op
PASS
ok github.com/konradreiche/cache 6.520s
Benchmarks

func BenchmarkGet(b *testing.B) {


cache := NewLFUCache(100)

keys := make([]string, 100)


items := make(map[string]string)
for i := 0; i < 100; i++ {
kv := fmt.Sprint(i)
keys[i] = kv
items[kv] = kv
}
cache.Set(items)

b.ResetTimer()
for i := 0; i < b.N; i++ {
cache.Get(keys)
}
}
Limitations

● We want to analyze and optimize all cache operations: Get, Set, Eviction

● Not all code paths are covered

● What cache hit and miss ratio to benchmark for?

● No concurrency, benchmark executes with one goroutine

● How do different operations behave when interleaving concurrently?


Limitations

● We want to analyze and optimize all cache operations: Get, Set, Eviction

● Not all code paths are covered

● What cache hit and miss ratio to benchmark for?

● No concurrency, benchmark executes with one goroutine

● How do different operations behave when interleaving concurrently?


Real Sample Data

Event log of cache access over 30 minutes including:

● timestamp
● posts keys

107907533,SA,Lw,OA,Iw,aA,RA,KA,CQ,Ow,Aw,Hg,Kg
111956832,upgb
121807061,upgb
134028958,l3Ir,iPMq,PcUn,T5Ej,ZQs,kTM,/98F,BFwJ,Oik,uYIB,gv8F
137975373,crgb,SCMU,NXUd,EyQI,244Z,DB4H,Tp0H,Kh8b,gREH,g9kG,o34E,wSYI,u+wF,h40M
142509895,iwM,hgM,CQQ,YQI
154850130,jTE,ciU,2U4,GQkB,4xo,U2QC,/7oB,dRIC,M0gB,bwYk
...
Limitations

● We want to analyze and optimize all cache operations: Get, Set, Eviction

● Not all code paths are covered

● What cache hit and miss ratio to benchmark for?

● No concurrency, benchmark executes with one goroutine

● How do different operations behave when interleaving concurrently?


b.RunParallel

RunParallel runs a benchmark in parallel. It creates multiple goroutines and


distributes b.N iterations among them. The number of goroutines defaults to
GOMAXPROCS.
b.RunParallel

RunParallel runs a benchmark in parallel. It creates multiple goroutines and


distributes b.N iterations among them. The number of goroutines defaults to
GOMAXPROCS.

func BenchmarkCache(b *testing.B) {

b.RunParallel(func(pb *testing.PB) {
// set up goroutine local state
for pb.Next() {
// execute one iteration of the benchmark
}
})
}
func BenchmarkCache(b *testing.B) {
cb := newBenchmarkCase(b, config{size: 400_000})
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
var br benchmarkResult
for pb.Next() {
log := cb.nextEventLog()

start := time.Now()
cached, missing := cb.cache.Get(log.keys)
br.observeGet(start, cached, missing)

if len(missing) > 0 {
data := lookupData(missing)
start := time.Now()
cb.cache.Set(data)
br.observeSetDuration(start)
}
}
cb.addLocalReports(br)
})
b.ReportMetric(cb.getHitRate(), "hit/op")
b.ReportMetric(cb.getTimePerGet(b), "read-ns/op")
b.ReportMetric(cb.getTimePerSet(b), "write-ns/op")
}
func BenchmarkCache(b *testing.B) { Custom benchmark case
cb := newBenchmarkCase(b, config{size: 400_000}) type to manage benchmark
b.ResetTimer() and collect data
b.RunParallel(func(pb *testing.PB) {
var br benchmarkResult
for pb.Next() {
log := cb.nextEventLog()

start := time.Now()
cached, missing := cb.cache.Get(log.keys)
br.observeGet(start, cached, missing)

if len(missing) > 0 {
data := lookupData(missing)
start := time.Now()
cb.cache.Set(data)
br.observeSetDuration(start)
}
}
cb.addLocalReports(br)
})
b.ReportMetric(cb.getHitRate(), "hit/op")
b.ReportMetric(cb.getTimePerGet(b), "read-ns/op")
b.ReportMetric(cb.getTimePerSet(b), "write-ns/op")
}
func BenchmarkCache(b *testing.B) {
cb := newBenchmarkCase(b, config{size: 400_000})
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
Collect per-goroutine
var br benchmarkResult
for pb.Next() { benchmark measurements
log := cb.nextEventLog()

start := time.Now()
cached, missing := cb.cache.Get(log.keys)
br.observeGet(start, cached, missing)

if len(missing) > 0 {
data := lookupData(missing)
start := time.Now()
cb.cache.Set(data)
br.observeSetDuration(start)
}
}
cb.addLocalReports(br)
})
b.ReportMetric(cb.getHitRate(), "hit/op")
b.ReportMetric(cb.getTimePerGet(b), "read-ns/op")
b.ReportMetric(cb.getTimePerSet(b), "write-ns/op")
}
func BenchmarkCache(b *testing.B) {
cb := newBenchmarkCase(b, config{size: 400_000})
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
var br benchmarkResult
for pb.Next() {
log := cb.nextEventLog()

start := time.Now()
cached, missing := cb.cache.Get(log.keys)
br.observeGet(start, cached, missing)

if len(missing) > 0 { Reproduce production behavior:


data := lookupData(missing) lookup & update.
start := time.Now()
cb.cache.Set(data)
br.observeSetDuration(start)
}
}
cb.addLocalReports(br)
})
b.ReportMetric(cb.getHitRate(), "hit/op")
b.ReportMetric(cb.getTimePerGet(b), "read-ns/op")
b.ReportMetric(cb.getTimePerSet(b), "write-ns/op")
}
func BenchmarkCache(b *testing.B) {
cb := newBenchmarkCase(b, config{size: 400_000})
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
var br benchmarkResult
for pb.Next() {
log := cb.nextEventLog()

start := time.Now()
cached, missing := cb.cache.Get(log.keys) We can measure duration of
br.observeGet(start, cached, missing) individual operations
manually
if len(missing) > 0 {
data := lookupData(missing)
start := time.Now()
cb.cache.Set(data)
br.observeSetDuration(start)
}
}
cb.addLocalReports(br)
})
b.ReportMetric(cb.getHitRate(), "hit/op")
b.ReportMetric(cb.getTimePerGet(b), "read-ns/op")
Use b.ReportMetric to
b.ReportMetric(cb.getTimePerSet(b), "write-ns/op") report custom metrics
}
go test -run=^$ -bench=BenchmarkCache -count=10
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x

Ensure that each benchmark


processes exactly 5,000 event
logs to improve comparability
of hit rate metric
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x
goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
BenchmarkCache/policy=lfu-16 5000 0.6141 hit/op 4795215 read-ns/op 2964262 write-ns/op
BenchmarkCache/policy=lfu-16 5000 0.6082 hit/op 4686778 read-ns/op 2270200 write-ns/op
BenchmarkCache/policy=lfu-16 5000 0.6159 hit/op 4332358 read-ns/op 1765885 write-ns/op
BenchmarkCache/policy=lfu-16 5000 0.6153 hit/op 4089562 read-ns/op 2504176 write-ns/op
BenchmarkCache/policy=lfu-16 5000 0.6152 hit/op 3472677 read-ns/op 1686928 write-ns/op
BenchmarkCache/policy=lfu-16 5000 0.6107 hit/op 4464410 read-ns/op 2695443 write-ns/op
BenchmarkCache/policy=lfu-16 5000 0.6155 hit/op 3624802 read-ns/op 1837148 write-ns/op
BenchmarkCache/policy=lfu-16 5000 0.6133 hit/op 3931610 read-ns/op 2154571 write-ns/op
BenchmarkCache/policy=lfu-16 5000 0.6151 hit/op 2440746 read-ns/op 1260662 write-ns/op
BenchmarkCache/policy=lfu-16 5000 0.6138 hit/op 3491091 read-ns/op 1944350 write-ns/op
BenchmarkCache/policy=lru-16 5000 0.6703 hit/op 2320270 read-ns/op 1127495 write-ns/op
BenchmarkCache/policy=lru-16 5000 0.6712 hit/op 2212118 read-ns/op 1019305 write-ns/op
BenchmarkCache/policy=lru-16 5000 0.6705 hit/op 2150089 read-ns/op 1037654 write-ns/op
BenchmarkCache/policy=lru-16 5000 0.6703 hit/op 2512224 read-ns/op 1134282 write-ns/op
BenchmarkCache/policy=lru-16 5000 0.6710 hit/op 2377883 read-ns/op 1079198 write-ns/op
BenchmarkCache/policy=lru-16 5000 0.6711 hit/op 2313210 read-ns/op 1120761 write-ns/op
BenchmarkCache/policy=lru-16 5000 0.6712 hit/op 2071632 read-ns/op 980912 write-ns/op
BenchmarkCache/policy=lru-16 5000 0.6709 hit/op 2410096 read-ns/op 1127907 write-ns/op
BenchmarkCache/policy=lru-16 5000 0.6709 hit/op 2226160 read-ns/op 1071007 write-ns/op
BenchmarkCache/policy=lru-16 5000 0.6709 hit/op 2383321 read-ns/op 1165734 write-ns/op
PASS
ok github.com/konradreiche/cache 846.442s
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

benchstat -col /policy bench


go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

benchstat -col /policy bench

goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz

│ LFU │ LRU │
│ hit/op │ hit/op vs base │
Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10)

│ LFU │ LRU │
│ read-sec/op │ read-sec/op vs base │
Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10)

│ LFU │ LRU │
│ write-sec/op │ write-sec/op vs base │
Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

benchstat -col /policy bench

goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz

│ LFU │ LRU │
│ hit/op │ hit/op vs base │
Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10)

│ LFU │ LRU │
│ read-sec/op │ read-sec/op vs base │
Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10)

│ LFU │ LRU │
│ write-sec/op │ write-sec/op vs base │
Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

benchstat -col /policy bench

goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz

│ LFU │ LRU │
│ hit/op │ hit/op vs base │
Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10)

│ LFU │ LRU │
│ read-sec/op │ read-sec/op vs base │
Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10)

│ LFU │ LRU │
│ write-sec/op │ write-sec/op vs base │
Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)
Taxonomy of Cache Replacement Policies
Coarse-Grained Fine-Grained
Policies Policies

Recency Frequency Hybrid Economic Value Reuse Distance Classification

LRU LFU ARC EVA Timekeeping DBCP


EELRU FBR LRFU AIP EAF
SegLRU 2Q DIP ETA SDBP
LIP DRRIP Leeway SHiP
SRRIP Hawkeye

100 Building a Highly Concurrent Cache in Go

[1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
Taxonomy of Cache Replacement Policies
Coarse-Grained Fine-Grained
Policies Policies

Recency Frequency Hybrid Economic Value Reuse Distance Classification

LRU LFU ARC EVA Timekeeping DBCP


EELRU FBR LRFU AIP EAF
SegLRU 2Q DIP ETA SDBP
LIP DRRIP Leeway SHiP
SRRIP Hawkeye

101 Building a Highly Concurrent Cache in Go

[1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
Taxonomy of Cache Replacement Policies
Coarse-Grained Fine-Grained
Policies Policies

Recency Frequency Hybrid Economic Value Reuse Distance Classification

LRU LFU ARC EVA Timekeeping DBCP


EELRU FBR LRFU AIP EAF
SegLRU 2Q DIP ETA SDBP
LIP DRRIP Leeway SHiP
SRRIP Hawkeye

102 Building a Highly Concurrent Cache in Go

[1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
Combining
LFU & LRU
LRFU (Least Recently/Least Frequently) type cache struct {
size int

A paper [2] published in 2001 suggests to combine mu sync.Mutex


data map[string]*item
LRU and LFU named LRFU. heap *MinHeap
}
● Similar to LFU: each item holds a value
● CRF: Combined Recency and Frequency type item struct {
● A parameter λ determines how much weight is key string
given to recent entries value string
index int
frequency int
}

[2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies.
IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

104 Building a Highly Concurrent Cache in Go


LRFU (Least Recently/Least Frequently) type cache struct {
size int
weight float64

A paper [2] published in 2001 suggests to combine mu sync.Mutex


data map[string]*item
LRU and LFU named LRFU. heap *MinHeap
}
● Similar to LFU: each item holds a value
● CRF: Combined Recency and Frequency type item struct {
● A parameter λ determines how much weight is key string
given to recent entries value string
index int
crf float64
λ = 1.0 (LRU)
}
λ = 0.0 (LFU)

[2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies.
IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

105 Building a Highly Concurrent Cache in Go


LRFU (Least Recently/Least Frequently) type cache struct {
size int
weight float64

A paper [2] published in 2001 suggests to combine mu sync.Mutex


data map[string]*item
LRU and LFU named LRFU. heap *MinHeap
}
● Similar to LFU: each item holds a value
● CRF: Combined Recency and Frequency type item struct {
● A parameter λ determines how much weight is key string
given to recent entries value string
index int
crf float64
λ = 1.0 (LRU)
}
λ = 0.0 (LFU)
λ = 0.001
[2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies.
IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

106 Building a Highly Concurrent Cache in Go


LRFU (Least Recently/Least Frequently) type cache struct {
size int
weight float64

A paper [2] published in 2001 suggests to combine mu sync.Mutex


data map[string]*item
LRU and LFU named LRFU. heap *MinHeap
}
● Similar to LFU: each item holds a value
● CRF: Combined Recency and Frequency type item struct {
● A parameter λ determines how much weight is key string
given to recent entries value string
index int
crf float64
λ = 1.0 (LRU)
}
λ = 0.0 (LFU)
λ = 0.001 (LFU with a pinch of LRU)
[2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies.
IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

107 Building a Highly Concurrent Cache in Go


LRFU (Least Recently/Least Frequently) type cache struct {
size int
weight float64

A paper [2] published in 2001 suggests to combine mu sync.Mutex


data map[string]*item
LRU and LFU named LRFU. heap *MinHeap
}
● Calculate CRF for every entry whenever they need to
be compared type item struct {
key string
● math.Pow not a cheap operation value string
index int
● 0.5λx prone to floating-point overflow crf float64
}
● New items likely to be evicted starting with CRF = 1.0
[2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies.
IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

108 Building a Highly Concurrent Cache in Go


DLFU
DLFU (Decaying LFU Cache Expiry)

Donovan Baarda, a Google SRE from Australia, came up with an improved


algorithm [3] and a Python reference implementation [4], realizing:

1. LRFU decay is a simple exponential decay

2. Exponential decay can be approximated which eliminates math.Pow

3. Exponentially grow the reference increment instead of exponentially decaying


all entries, thus requiring fewer fields per entry and fewer comparisons

[3] https://github.com/dbaarda/DLFUCache
[4] https://minkirri.apana.org.au/wiki/DecayingLFUCacheExpiry

110 Building a Highly Concurrent Cache in Go


DLFU Cache

func NewDLFUCache[V any](ctx context.Context, config config.Config) *DLFUCache[V] {


cache := &DLFUCache[V]{
data: make(map[string]*Item[V], len(config.Size)),
heap: &MinHeap[V]{},
weight: config.Weight,
size: config.Size,
incr: 1.0,
}

if config.Weight == 0.0 { // there is no decay for LFU policy


cache.decay = 1
}

p := float64(config.Size) * config.Weight
cache.decay = (p + 1.0) / p

return cache
}

111 Building a Highly Concurrent Cache in Go


DLFU Cache

func (c *lfuCache) Set(items map[string]string) {


c.mu.Lock()
defer c.mu.Unlock()

for key, value := range items {


item := &item{
key: key,
value: value,
frequency: 1,
}
c.data[key] = item
heap.Push(&c.heap, item)
}

for len(c.data) > c.size {


item := heap.Pop(&c.heap).(*Item)
delete(c.data, item.key)
}
}

112 Building a Highly Concurrent Cache in Go


DLFU Cache

func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) {


c.mu.Lock()
defer c.mu.Unlock()

for key, value := range items {


item := &item{
key: key,
value: value,
frequency: 1,
}
c.data[key] = item
heap.Push(&c.heap, item)
}

c.trim()
}

113 Building a Highly Concurrent Cache in Go


DLFU Cache

func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) {


c.mu.Lock()
defer c.mu.Unlock()

for key, value := range items {

item := &item{
key: key,
value: value,
frequency: 1,
}
c.data[key] = item
heap.Push(&c.heap, item)
}
c.trim()
}

114 Building a Highly Concurrent Cache in Go


DLFU Cache

func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) {


c.mu.Lock()
defer c.mu.Unlock()

for key, value := range items {

item := &item{
key: key,
value: value,
score: c.incr,

}
c.data[key] = item
heap.Push(&c.heap, item)
}
c.trim()
}
115 Building a Highly Concurrent Cache in Go
DLFU Cache

func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) {


c.mu.Lock()
defer c.mu.Unlock()
expiresAt := time.Now().Add(expiry)
for key, value := range items {
if item, ok := c.data[key]; ok {
item.value = value
item.expiresAt = now.Add(expiry)
continue
}
item := &item{
key: key,
value: value,
score: c.incr,
expiresAt: expiresAt,
}
c.data[key] = item
heap.Push(&c.heap, item)
}
c.trim()
}
116 Building a Highly Concurrent Cache in Go
DLFU Cache

func (c *lfuCache) Get(keys []string) (map[string]string, []string) {


result := make(map[string]string)
missing := make([]string, 0)

c.mu.Lock()
defer c.mu.Unlock()

for _, key := range keys {


if item, ok := c.data[key]; ok {
result[key] = value
frequency := item.frequency+1
c.heap.update(item, freqency)
} else {
missing = append(missing, key)
}

return result, missing


}

117 Building a Highly Concurrent Cache in Go


DLFU Cache

func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {


result := make(map[string]string)
missing := make([]string, 0)

c.mu.Lock()
defer c.mu.Unlock()

for _, key := range keys {


if item, ok := c.data[key]; ok {
result[key] = value
frequency := item.frequency+1
c.heap.update(item, freqency)
} else {
missing = append(missing, key)
}

return result, missing


}

118 Building a Highly Concurrent Cache in Go


DLFU Cache

func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {


result := make(map[string]string)
missing := make([]string, 0)

c.mu.Lock()
defer c.mu.Unlock()

for _, key := range keys {


if item, ok := c.data[key]; ok && !item.expired() {
result[key] = value
frequency := item.frequency+1
c.heap.update(item, freqency)
} else {
missing = append(missing, key)
}

return result, missing


}

119 Building a Highly Concurrent Cache in Go


DLFU Cache

func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {


result := make(map[string]string)
missing := make([]string, 0)

c.mu.Lock()
defer c.mu.Unlock()

for _, key := range keys {


if item, ok := c.data[key]; ok && !item.expired() {
result[key] = value
item.score += c.incr
c.heap.update(item, item.score)
} else {
missing = append(missing, key)
}

return result, missing


}

120 Building a Highly Concurrent Cache in Go


DLFU Cache

func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {


result := make(map[string]string)
missing := make([]string, 0)

c.mu.Lock()
defer c.mu.Unlock()

for _, key := range keys {


if item, ok := c.data[key]; ok && !item.expired() {
result[key] = value
item.score += c.incr
c.heap.update(item, item.score)
} else {
missing = append(missing, key)
}

}
c.incr *= c.decay
return result, missing
}

121 Building a Highly Concurrent Cache in Go


go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

benchstat -col /policy bench


go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

benchstat -col /policy bench

goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ LRU │ DLFU │
│ hit/op │ hit/op vs base │
Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10)

│ LRU │ DLFU │
│ read-sec/op │ read-sec/op vs base │
Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10)

│ LRU │ DLFU │
│ write-sec/op │ write-sec/op vs base │
Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

benchstat -col /policy bench

goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ LRU │ DLFU │
│ hit/op │ hit/op vs base │
Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10)

│ LRU │ DLFU │
│ read-sec/op │ read-sec/op vs base │
Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10)

│ LRU │ DLFU │
│ write-sec/op │ write-sec/op vs base │
Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x -cpuprofile=cpu.out > bench

benchstat -col /policy bench

goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ LRU │ DLFU │
│ hit/op │ hit/op vs base │
Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10)

│ LRU │ DLFU │
│ read-sec/op │ read-sec/op vs base │
Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10)

│ LRU │ DLFU │
│ write-sec/op │ write-sec/op vs base │
Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)
Profiling
go tool pprof cpu.out
File: cache.test
Type: cpu
Time: Sep 24, 2023 at 3:04pm (PDT)
Duration: 850.60s, Total samples = 1092.33s (128.42%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

126 Building a Highly Concurrent Cache in Go


Profiling
go tool pprof cpu.out
File: cache.test
Type: cpu
Time: Sep 24, 2023 at 3:04pm (PDT)
Duration: 850.60s, Total samples = 1092.33s (128.42%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 567.44s, 51.95% of 1092.33s total
Dropped 470 nodes (cum <= 5.46s)
Showing top 10 nodes out of 104
flat flat% sum% cum cum%
118.69s 10.87% 10.87% 169.36s 15.50% runtime.findObject
88.04s 8.06% 18.93% 88.04s 8.06% github.com/konradreiche/cache/dlfu/v1.MinHeap[go.shape.string].Less
72.38s 6.63% 25.55% 319.24s 29.23% runtime.scanobject
60.74s 5.56% 31.11% 106.72s 9.77% runtime.mapaccess2_faststr
45.03s 4.12% 35.23% 126.25s 11.56% runtime.mapassign_faststr
40.60s 3.72% 38.95% 40.60s 3.72% time.Now
40.20s 3.68% 42.63% 41.13s 3.77% container/list.(*List).move
35.47s 3.25% 45.88% 35.47s 3.25% memeqbody
34.25s 3.14% 49.01% 34.25s 3.14% runtime.memclrNoHeapPointers
32.04s 2.93% 51.95% 44.34s 4.06% runtime.mapdelete_faststr

127 Building a Highly Concurrent Cache in Go


Profiling
go tool pprof cpu.out
File: cache.test
Type: cpu
Time: Sep 24, 2023 at 3:04pm (PDT)
Duration: 850.60s, Total samples = 1092.33s (128.42%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list DLFU.*Get

128 Building a Highly Concurrent Cache in Go


Total: 1092.33s
ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get
6.96s 218.27s (flat, cum) 19.98% of Total
. . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) {
. . 75: var missingKeys []string
. 490ms 76: result := make(map[string]V)
. . 77:
40ms 2.56s 78: c.mu.Lock()
30ms 30ms 79: defer c.mu.Unlock()
. . 80:
1.04s 1.04s 81: for _, key := range keys {
1.34s 43.77s 82: item, ok := c.data[key]
290ms 53.66s 83: if ok && !item.expired() {
1.68s 44.35s 84: result[key] = item.value
1.72s 1.72s 85: item.score += c.incr
130ms 65.82s 86: c.heap.update(item, item.score)
. . 87: } else {
530ms 3.55s 88: missingKeys = append(missingKeys, key)
. . 89: }
. . 90: }
20ms 20ms 91: c.incr *= c.decay
140ms 1.26s 92: return result, missingKeys
. . 93:}
. . 94:

129 Building a Highly Concurrent Cache in Go


Total: 1092.33s
ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get
6.96s 218.27s (flat, cum) 19.98% of Total
. . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) {
. . 75: var missingKeys []string
. 490ms 76: result := make(map[string]V)
. . 77: Maintaining a heap is more
40ms 2.56s 78: c.mu.Lock()
30ms 30ms 79: defer c.mu.Unlock() expensive than LRU, which only
. . 80: requires a doubly linked list.
1.04s 1.04s 81: for _, key := range keys {
1.34s 43.77s 82: item, ok := c.data[key]
290ms 53.66s 83: if ok && !item.expired() {
1.68s 44.35s 84: result[key] = item.value
1.72s 1.72s 85: item.score += c.incr
130ms 65.82s 86: c.heap.update(item, item.score)
. . 87: } else {
530ms 3.55s 88: missingKeys = append(missingKeys, key)
. . 89: }
. . 90: }
20ms 20ms 91: c.incr *= c.decay
140ms 1.26s 92: return result, missingKeys
. . 93:}
. . 94:

130 Building a Highly Concurrent Cache in Go


Total: 1092.33s
ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get
6.96s 218.27s (flat, cum) 19.98% of Total
. . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) {
. . 75: var missingKeys []string
. 490ms 76: result := make(map[string]V)
. . 77: The CPU profile does not
40ms 2.56s 78: c.mu.Lock()
30ms 30ms 79: defer c.mu.Unlock() capture the time spent waiting
. . 80: to acquire a lock.
1.04s 1.04s 81: for _, key := range keys {
1.34s 43.77s 82: item, ok := c.data[key]
290ms 53.66s 83: if ok && !item.expired() {
1.68s 44.35s 84: result[key] = item.value
1.72s 1.72s 85: item.score += c.incr
130ms 65.82s 86: c.heap.update(item, item.score)
. . 87: } else {
530ms 3.55s 88: missingKeys = append(missingKeys, key)
. . 89: }
. . 90: }
20ms 20ms 91: c.incr *= c.decay
140ms 1.26s 92: return result, missingKeys
. . 93:}
. . 94:

131 Building a Highly Concurrent Cache in Go


go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out
go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out

File: cache.test
Type: delay
Time: Sep 24, 2023 at 3:48pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out

File: cache.test
Type: delay
Time: Sep 24, 2023 at 3:48pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list DLFU.*Get
Total: 615.48s
ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get
0 297.55s (flat, cum) 48.34% of Total
. . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) {
. . 75: var missingKeys []string
. . 76: result := make(map[string]V)
. . 77:
. 297.55s 78: c.mu.Lock()
. . 79: defer c.mu.Unlock()
. . 80:
. . 81: for _, key := range keys {
. . 82: item, ok := c.data[key]
. . 83: if ok && !item.expired() {
go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out

File: cache.test
Type: delay
Time: Sep 24, 2023 at 3:48pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list DLFU.*Get
Total: 615.48s
ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get
0 297.55s (flat, cum) 48.34% of Total
. . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) {
. . 75: var missingKeys []string
. . 76: result := make(map[string]V)
. . 77:
. 297.55s 78: c.mu.Lock()
. . 79: defer c.mu.Unlock()
. . 80:
. . 81: for _, key := range keys {
. . 82: item, ok := c.data[key]
. . 83: if ok && !item.expired() {
go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out

File: cache.test
Type: delay
Time: Sep 24, 2023 at 3:48pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list DLFU.*Set
Total: 615.48s
ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Set
0 193.89s (flat, cum) 31.50% of Total
. . 99:func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) {
. 193.89s 100: c.mu.Lock()
. . 101: defer c.mu.Unlock()
. . 102:
. . 103: now := time.Now()
. . 104: for key, value := range items {
. . 105: if ctx.Err() != nil {
func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)

c.mu.Lock()
defer c.mu.Unlock()

for _, key := range keys {


if item, ok := c.data[key]; ok && !item.expired() {
result[key] = value
item.score += c.incr
Critical Section
c.heap.update(item, item.score)
} else {
missing = append(missing, key)
}

}
c.incr *= c.decay Critical Section
return result, missing
}

137 Building a Highly Concurrent Cache in Go


func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)

c.mu.Lock()
defer c.mu.Unlock()

for _, key := range keys {


if item, ok := c.data[key]; ok && !item.expired() {
result[key] = value
item.score += c.incr
c.heap.update(item, item.score)
} else {
missing = append(missing, key)
}

}
c.incr *= c.decay
return result, missing
}

138 Building a Highly Concurrent Cache in Go


func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)

for _, key := range keys {

if item, ok := c.data[key]; ok && !item.expired() {


result[key] = value
item.score += c.incr
c.heap.update(item, item.score)
} else {
missing = append(missing, key)
}

c.incr *= c.decay

return result, missing


}

139 Building a Highly Concurrent Cache in Go


func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)

for _, key := range keys {


c.mu.Lock()
if item, ok := c.data[key]; ok && !item.expired() {
result[key] = value
item.score += c.incr
c.heap.update(item, item.score)
} else {
missing = append(missing, key)
}
c.mu.Unlock()
}
c.mu.Lock()
c.incr *= c.decay
c.mu.Unlock()
return result, missing
}

140 Building a Highly Concurrent Cache in Go


go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

benchstat -col /dlfu bench

goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ V1 │ V2 │
│ hit/op │ hit/op vs base │
Cache-16 70.94% ± 0% 70.30% ± 0% -0.89% (p=0.000 n=10)

│ V1 │ V2 │
│ read-sec/op │ read-sec/op vs base │
Cache-16 4.34ms ± 54% 1.43ms ± 26% -67.13% (p=0.001 n=10)

│ V1 │ V2 │
│ write-sec/op │ write-sec/op vs base │
Cache-16 2.43ms ± 62% 574.3µs ± 25% -76.36% (p=0.000 n=10)
In Production
In Production
In Production

● Spike in number of goroutines, memory usage & timeouts


In Production

● Spike in number of goroutines, memory usage & timeouts

● Latency added to call paths integrating the local in-memory


cache
In Production

● Spike in number of goroutines, memory usage & timeouts

● Latency added to call paths integrating the local in-memory


cache

● For incremental progress:

○ Feature Flags with Sampling

○ Timeout for Cache Operations


In Production

cached, missingIDs := cache.Get(keys)


In Production: Feature Flags with Sampling

if !liveconfig.Sample("cache.read_rate") {
return
}
cached, missingIDs := cache.Get(keys)
In Production: Timeout for Cache Operations

if !liveconfig.Sample("cache.read_rate") {
return
}

go func() {
cached, missingIDs = localCache.Get(keys)

}()
In Production: Timeout for Cache Operations

if !liveconfig.Sample("cache.read_rate") {
return
}

// perform cache-lookup in goroutine to avoid blocking for too long


ctx, cancel := context.WithTimeout(context.Background(), 5*time.Millisecond)
go func() {
cached, missingIDs = localCache.Get(keys)
cancel()
}()

<-ctx.Done()

// timeout: return all keys as missing and let remote cache handle it
if ctxCache.Err() == context.DeadlineExceeded {
return map[string]T{}, keys
}
In Production: Timeout for Cache Operations

if !liveconfig.Sample("cache.read_rate") {
return
}

// perform cache-lookup in goroutine to avoid blocking for too long


ctx, cancel := context.WithTimeout(context.Background(), 5*time.Millisecond)
go func() {
cached, missingIDs = localCache.Get(keys) Pass context into the cache
cancel() operations too.
}()

<-ctx.Done()

// timeout: return all keys as missing and let remote cache handle it
if ctxCache.Err() == context.DeadlineExceeded {
return map[string]T{}, keys
}
In Production: Timeout for Cache Operations
func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)

for _, key := range keys {


c.mu.Lock()
if item, ok := c.data[key]; ok && !item.expired() {
result[key] = value
item.score += c.incr
c.heap.update(item, item.score)
} else {
missing = append(missing, key)
}
c.mu.Unlock()
}
c.mu.Lock()
c.incr *= c.decay
c.mu.Unlock()
return result, missing
}
In Production: Timeout for Cache Operations
func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)

for _, key := range keys {

c.mu.Lock()
if item, ok := c.data[key]; ok && !item.expired() {
result[key] = value
item.score += c.incr
c.heap.update(item, item.score)
} else {
missing = append(missing, key)
}
c.mu.Unlock()
}
c.mu.Lock()
c.incr *= c.decay
c.mu.Unlock()
return result, missing
}
In Production: Timeout for Cache Operations
func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)

for _, key := range keys { The goroutine gets abandoned


after the timeout. Checking for
context cancellation to stop
c.mu.Lock()
iteration helps to reduce lock
if item, ok := c.data[key]; ok && !item.expired() { contention.
result[key] = value
item.score += c.incr
c.heap.update(item, item.score)
} else {
missing = append(missing, key)
}
c.mu.Unlock()
}
c.mu.Lock()
c.incr *= c.decay
c.mu.Unlock()
return result, missing
}
Beyond
sync.Mutex
sync.Map

156 Building a Highly Concurrent Cache in Go


sync.Map
● I wrongly assumed sync.Map is an untyped map proteced by a sync.RWMutex.

● I recommend to frequently dive into the standard library source code, for
example for sync.Map we can see this implementation is much more intricate:
func (m *Map) Load(key any) (value any, ok bool) {
read := m.loadReadOnly()
e, ok := read.m[key]
if !ok && read.amended {
m.mu.Lock()
// Avoid reporting a spurious miss if m.dirty got promoted while we were
// blocked on m.mu. (If further loads of the same key will not miss, it's
// not worth copying the dirty map for this key.)
read = m.loadReadOnly()
e, ok = read.m[key]

157 Building a Highly Concurrent Cache in Go


sync.Map
Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple
goroutines without additional locking or coordination. Loads, stores, and deletes run in
amortized constant time.

The Map type is specialized. Most code should use a plain Go map instead, with separate
locking or coordination, for better type safety and to make it easier to maintain other
invariants along with the map content.

The Map type is optimized for two common use cases:


(1) when the entry for a given key is only ever written once but read many times, as in caches
that only grow, or

(2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In
these two cases, use of a Map may significantly reduce lock contention compared to a Go
map paired with a separate Mutex or RWMutex.
158 Building a Highly Concurrent Cache in Go
sync.Map
Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple
goroutines without additional locking or coordination. Loads, stores, and deletes run in
amortized constant time.

The Map type is specialized. Most code should use a plain Go map instead, with separate
locking or coordination, for better type safety and to make it easier to maintain other
invariants along with the map content.

The Map type is optimized for two common use cases:


(1) when the entry for a given key is only ever written once but read many times, as in caches
that only grow, or

(2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In
these two cases, use of a Map may significantly reduce lock contention compared to a Go
map paired with a separate Mutex or RWMutex.
159 Building a Highly Concurrent Cache in Go
sync.Map
Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple
goroutines without additional locking or coordination. Loads, stores, and deletes run in
amortized constant time.

The Map type is specialized. Most code should use a plain Go map instead, with separate
locking or coordination, for better type safety and to make it easier to maintain other
invariants along with the map content.

The Map type is optimized for two common use cases:


(1) when the entry for a given key is only ever written once but read many times, as in caches
that only grow, or

(2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In
these two cases, use of a Map may significantly reduce lock contention compared to a Go
map paired with a separate Mutex or RWMutex.
160 Building a Highly Concurrent Cache in Go
xsync.Map

● Third-party library providing concurrent data structures for Go


https://github.com/puzpuzpuz/xsync

● xsync.Map is a concurrent hash table based map using a


modified version of Cache-Line Hash Table (CLHT) data
structure.

● CLHT organizes the hash table in cache-line-sized buckets to


reduce the number of cache-line transfers.

161 Building a Highly Concurrent Cache in Go


xsync.Map

● Third-party library providing concurrent data structures for Go


https://github.com/puzpuzpuz/xsync

● xsync.Map is a concurrent hash table based map using a


modified version of Cache-Line Hash Table (CLHT) data
structure.

● CLHT organizes the hash table in cache-line-sized buckets to


reduce the number of cache-line transfers.

162 Building a Highly Concurrent Cache in Go


Symmetric Multiprocessing (SMP)

CPU

Core Core Core Core

L1 L1 L1 L1

L2 L2 L2 L2

L3

Main Memory (RAM)

163 Building a Highly Concurrent Cache in Go


Locality of Reference

● Temporal Locality: a processor accessing a particular memory location will


likely access it again in the near future.

● Spatial Locality: a processing accessing a particular memory location will


access memory locations nearby.

● Not one memory location is copied to the CPU cache, but a cache line.

● Cache Line (Cache Block): adjacent chunk of memory.

164 Building a Highly Concurrent Cache in Go


False Sharing

Main Memory

var x var y

CPU Core 1 CPU Core 2

L1 L1

165 Building a Highly Concurrent Cache in Go


False Sharing

Main Memory

var x var y

Read variable x into cache

CPU Core 1 CPU Core 2

L1 L1

166 Building a Highly Concurrent Cache in Go


False Sharing

Main Memory

var x var y

CPU Core 1 CPU Core 2

L1 L1

var x var y
E

167 Building a Highly Concurrent Cache in Go


False Sharing

Main Memory

var x var y

Read variable y into cache

CPU Core 1 CPU Core 2

L1 L1

var x var y
E

168 Building a Highly Concurrent Cache in Go


False Sharing

Main Memory

var x var y

CPU Core 1 CPU Core 2

L1 L1

var x var y var x var y


S S

169 Building a Highly Concurrent Cache in Go


False Sharing

Main Memory

var x var y

Write to variable x

CPU Core 1 CPU Core 2

L1 L1

var x var y var x var y


M S

170 Building a Highly Concurrent Cache in Go


False Sharing

Main Memory

var x var y

CPU Core 1 CPU Core 2

L1 Invalidate cache line L1

var x var y var x var y


M S

171 Building a Highly Concurrent Cache in Go


False Sharing

Main Memory

var x var y

CPU Core 1 CPU Core 2

L1 L1

var x var y
M

172 Building a Highly Concurrent Cache in Go


False Sharing

Main Memory

var x var y

Write results in coherence miss

CPU Core 1 CPU Core 2

L1 Invalidate cache line L1

var x var y
M

173 Building a Highly Concurrent Cache in Go


False Sharing

Main Memory

var x var y

Coherence Write-Back

CPU Core 1 CPU Core 2

L1 L1

var x var y
M

174 Building a Highly Concurrent Cache in Go


False Sharing

Main Memory

var x var y

CPU Core 1 CPU Core 2

L1 L1

var x var y
M

175 Building a Highly Concurrent Cache in Go


False Sharing

Main Memory

var x var y

Read results in coherence miss

CPU Core 1 CPU Core 2

L1 Invalidate cache line L1

var x var y
M

176 Building a Highly Concurrent Cache in Go


False Sharing

Main Memory

var x var y

Coherence Write-Back

CPU Core 1 CPU Core 2

L1 L1

var x var y
M

177 Building a Highly Concurrent Cache in Go


False Sharing

Main Memory

var x var y

CPU Core 1 CPU Core 2

L1 L1

var x var y var x var y


S S

178 Building a Highly Concurrent Cache in Go


Cache Coherence Protocol

● Ensure CPU cores have a consistent view of the same data.

● Added coordination between CPU cores impacts application performance.

● Reducing the need for cache coherence will make for faster Go applications.

179 Building a Highly Concurrent Cache in Go


xsync.Map

● Third-party library providing concurrent data structures for Go


https://github.com/puzpuzpuz/xsync

● xsync.Map is a concurrent hash table based map using a


modified version of Cache-Line Hash Table (CLHT) data
structure.

● CLHT organizes the hash table in cache-line-sized buckets to


reduce the number of cache-line transfers.

180 Building a Highly Concurrent Cache in Go


DLFU Cache: Removing Locking

● Maintaining the heap still requires a mutex.

● To fully leverage xsync.Map we would want to eliminate the mutex.

● Perform cache eviction in a goroutine: collect all entries and sort them.

● Replace synchronized access to numeric or string types in Go with atomic


operations.

● It’s not really lock-free: move locking closer to the CPU.

181 Building a Highly Concurrent Cache in Go


Perform Cache Eviction Asynchronously
func (c *DLFUCache[V]) trimmer(ctx context.Context) {
for {
select {
case <-ctx.Done():
return
case <-time.After(250 * time.Millisecond):
if ctx.Err() != nil {
return
}
c.trim()
}
}
}

182 Building a Highly Concurrent Cache in Go


Perform Cache Eviction Asynchronously
func (c *DLFUCache[V]) trim() {
size := c.data.Size()
if size <= c.size {
return
}

items := make(items[V], 0, size)


c.data.Range(func(key string, value *item[V]) bool {
items = append(items, value)
return true
})
sort.Sort(items)

for i := 0; i < len(items)-c.size; i++ {


c.data.Delete(items[i].key.Load())
}
}

183 Building a Highly Concurrent Cache in Go


Integrate xsync.Map
func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)

for _, key := range keys {


if ctx.Err() != nil {
return result, append(keys[i:], missingKeys...)
}
c.mu.Lock()
if item, ok := c.data[key]; ok && !item.expired() {
result[key] = value
item.score += c.incr
c.heap.update(item, item.score)
} else {
missing = append(missing, key)
}
c.mu.Unlock()
}
c.mu.Lock()
c.incr *= c.decay
l.mu.Unlock()
return result, missing
}
Integrate xsync.Map
func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)

for _, key := range keys {


if ctx.Err() != nil {
return result, append(keys[i:], missingKeys...)
}

if item, ok := c.data[key]; ok && !item.expired() {


result[key] = value
item.score += c.incr

} else {
missing = append(missing, key)
}

c.incr *= c.decay

return result, missing


}
Integrate xsync.Map
func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)

for _, key := range keys {


if ctx.Err() != nil {
return result, append(keys[i:], missingKeys...)
}
if item, ok := c.data[key]; ok && !item.expired() {
result[key] = value
item.score += c.incr
} else {
missing = append(missing, key)
}
}
c.incr *= c.decay
return result, missing
}
Integrate xsync.Map
func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)

for _, key := range keys {


if ctx.Err() != nil {
return result, append(keys[i:], missingKeys...)
}
if item, ok := c.data.Load(key); ok && !item.expired() {
result[key] = value
item.score += c.incr
} else {
missing = append(missing, key)
}
}
c.incr *= c.decay
return result, missing
}
Integrate xsync.Map
func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)
incr := c.incr.Load()
for _, key := range keys {
if ctx.Err() != nil {
return result, append(keys[i:], missingKeys...)
}
if item, ok := c.data.Load(key); ok && !item.expired() {
result[key] = value
item.score.Add(incr)
} else {
missing = append(missing, key)
}
}
c.incr.Store(incr * c.decay)
return result, missing
}
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

benchstat -col /dlfu bench


go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

benchstat -col /dlfu bench

goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz

│ V2 │ V3 │
│ hit/op │ hit/op vs base │
Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10)

│ V2 │ V3 │
│ read-sec/op │ read-sec/op vs base │
Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10)

│ V2 │ V3 │
│ trim-sec/op │ trim-sec/op vs base │
Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10)

│ V2 │ V3 │
│ write-sec/op │ write-sec/op vs base │
Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

benchstat -col /dlfu bench

goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz

│ V2 │ V3 │
│ hit/op │ hit/op vs base │
Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10)

│ V2 │ V3 │
│ read-sec/op │ read-sec/op vs base │
Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10)

│ V2 │ V3 │
│ trim-sec/op │ trim-sec/op vs base │
Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10)

│ V2 │ V3 │
│ write-sec/op │ write-sec/op vs base │
Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

benchstat -col /dlfu bench

goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz

│ V2 │ V3 │
│ hit/op │ hit/op vs base │
Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10)

│ V2 │ V3 │
│ read-sec/op │ read-sec/op vs base │
Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10)

│ V2 │ V3 │
│ trim-sec/op │ trim-sec/op vs base │
Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10)

│ V2 │ V3 │
│ write-sec/op │ write-sec/op vs base │
Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)
go tool pprof cpu.out
File: cache.test
Type: cpu
Time: Sep 26, 2023 at 1:16pm (PDT)
Duration: 42.14s, Total samples = 81.68s (193.83%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
go tool pprof cpu.out
File: cache.test
Type: cpu
Time: Sep 26, 2023 at 1:16pm (PDT)
Duration: 42.14s, Total samples = 81.68s (193.83%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list trim
Total: 81.68s
ROUTINE ======================== github.com/konradreiche/cache/dlfu/v3.(*DLFUCache[go.shape.string]).trim
20ms 37.23s (flat, cum) 45.58% of Total
. . 197:func (c *DLFUCache[V]) Trim() {
. 20ms 198: size := c.data.Size()
. 10ms 199: if c.data.Size() <= c.size {
. . 200: return
. . 201: }
. . 202:
. 80ms 203: items := make(*items[V], 0, size)
. 6.82s 204: c.data.Range(func(key string, value *item[V]) bool {
. . 205: items = append(items, value)
. . 206: return true
. . 207: })
. 26.98s 208: sort.Sort(items)
. . 209:
10ms 10ms 210: for i := 0; i < len(items)-c.size; i++ {
10ms 680ms 211: key := items[i].key.Load()
. 2.63s 212: c.data.Delete(key)
. . 213: }
. . 214:}
go tool pprof cpu.out
File: cache.test
Type: cpu
Time: Sep 26, 2023 at 1:16pm (PDT)
Duration: 42.14s, Total samples = 81.68s (193.83%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list trim
Total: 81.68s
ROUTINE ======================== github.com/konradreiche/cache/dlfu/v3.(*DLFUCache[go.shape.string]).trim
20ms 37.23s (flat, cum) 45.58% of Total
. . 197:func (c *DLFUCache[V]) Trim() {
. 20ms 198: size := c.data.Size()
. 10ms 199: if c.data.Size() <= c.size {
. . 200: return
. . 201: }
. . 202:
. 80ms 203: items := make(*items[V], 0, size)
. 6.82s 204: c.data.Range(func(key string, value *item[V]) bool {
. . 205: items = append(items, value)
. . 206: return true
. . 207: })
. 26.98s 208: sort.Sort(items)
. . 209:
10ms 10ms 210: for i := 0; i < len(items)-c.size; i++ {
10ms 680ms 211: key := items[i].key.Load()
. 2.63s 212: c.data.Delete(key)
. . 213: }
. . 214:}
Faster Eviction

● sort.Sort uses pattern-defeating quicksort (pdqsort)

● On the Gophers Slack in #performance Aurélien Rainone suggested to use


quickselect.

● Quickselect is a linear algorithm to find the k-th smallest elements.

196 Building a Highly Concurrent Cache in Go


go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

benchstat -col /dlfu bench

goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz

│ V4 │ V5 │
│ hit/op │ hit/op vs base │
Cache-16 76.44% ± 1% 74.84% ± 1% -2.09% (p=0.001 n=10)

│ V4 │ V5 │
│ read-sec/op │ read-sec/op vs base │
Cache-16 477.5µ ± 40% 358.4µ ± 44% ~ (p=0.529 n=10)

│ V4 │ V5 │
│ trim-sec/op │ trim-sec/op vs base │
Cache-16 463.3m ± 54% 129.1m ± 85% -72.14% (p=0.002 n=10)

│ V4 │ V5 │
│ write-sec/op │ write-sec/op vs base │
Cache-16 193.2µ ± 53% 133.6µ ± 40% ~ (p=0.280 n=10)
Summary

● Implementing your own cache in Go makes it possible to optimize by leveraging


properties that are unique to your use case.

● Different cache replacement policies: LRU, LFU, DLFU, etc.

● DLFU (Decaying Least Frequently Used): like LFU but with exponential decay on the
cache entry’s reference count.

● How to write benchmarks and utilize parallel execution for concurrency.

● Using Go’s profiler to optimize for concurrency contention.

198 Building a Highly Concurrent Cache in Go


Summary

● Cache coherency protocol can impact concurrent performance in Go


applications

● There is no such thing as lock-free when multiple processors are involved.

● Performance can be improved with lock-free data structures and atomic


primitives, but your mileage will differ.

199 Building a Highly Concurrent Cache in Go


“ Don’t generalize from the talk’s
example. Write your own code,
construct your own benchmarks. “
You will be surprised.

200 Building a Highly Concurrent Cache in Go


Thank you!

Konrad Reiche
@konradreiche

You might also like