Building A Highly Concurrent Cache in Go
Building A Highly Concurrent Cache in Go
Concurrent Cache in Go
A Hitchhiker’s Guide
Konrad Reiche
“
In its full generality, multithreading is
an incredibly complex and error-prone
technique, not to be recommended in
any but the smallest programs. “
― C. A. R. Hoare:
Communicating Sequential Processes
3 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
4 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
5 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
6 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
7 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
8 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
9 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
10 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
11 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
12 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
13 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
14 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
15 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
16 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
17 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
18 Building a Highly Concurrent Cache in Go
Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
[7932676.760082] Out of memory: kill process 23936
[7932676.761528] Killed process 23936 (go)
“ Don’t panic.
“
20 Building a Highly Concurrent Cache in Go
“ Don’t panic.
― Effective Go
“
21 Building a Highly Concurrent Cache in Go
“ Don’t panic.
― Effective Go
“
― Douglas Adams:
The Hitchhiker's
Guide to the Galaxy
data, ok := cache.Get()
if !ok {
data := doSomething()
cache.Set(data)
}
27
ca. 2019
My Manager
28
ca. 2019
My Manager
and Me
Redis Cache
Cluster Kubernetes Pods for
Ranking Service
Update on Miss
Cache Lookups
Redis Cache
Cluster Kubernetes Pods for
Ranking Service
Update on Miss
Cache Lookups
5-20m op/sec
Redis Cache
Cluster Kubernetes Pods for
Ranking Service
Update on Miss
Cache Lookups
5-20m op/sec
Redis Cache
Cluster Kubernetes Pods for
Local In-Memory Cache with Ranking Service
~90k op/sec per pod
data map[string]string
}
data map[K]V
}
data map[string]string
}
data map[string]string
}
c.data[key] = value
}
value, ok := c.data[key]
return value, ok
}
Entry2
Entry3
⋮
Entryn
⋮
Entryn
⋮
Because we cannot predict the future, we can Entryn
only try to approximate this behavior.
Longer History
More Access Patterns
[1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
Taxonomy of Cache Replacement Policies
Coarse-Grained Fine-Grained
Policies Policies
LRU LFU
Longer History
More Access Patterns
[1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
LFU
c.mu.Lock()
a defer c.mu.Unlock()
frequency: 1
c.mu.Lock()
a defer c.mu.Unlock()
frequency: 1
c.mu.Lock()
c defer c.mu.Unlock()
frequency: 1
LRU LFU
Longer History
More Access Patterns
[1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
Taxonomy of Cache Replacement Policies
Coarse-Grained Fine-Grained
Policies Policies
[1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
Caching Post Data for Ranking Thing Service
Update on Miss
Cache Lookups
5-20m op/sec
Redis Cache
Cluster Kubernetes Pods for
Local In-Memory Cache with Ranking Service
~90k op/sec per pod
func BenchmarkXxx(*testing.B)
are considered benchmarks, and are executed by the go test command when its
-bench flag is provided.
}
}
Benchmarks
}
}
Benchmarks
b.ResetTimer()
for i := 0; i < b.N; i++ {
}
}
Benchmarks
b.ResetTimer()
for i := 0; i < b.N; i++ {
cache.Get(keys)
}
}
Benchmarks
b.ResetTimer()
for i := 0; i < b.N; i++ {
cache.Get(keys)
}
}
Limitations
● We want to analyze and optimize all cache operations: Get, Set, Eviction
● We want to analyze and optimize all cache operations: Get, Set, Eviction
● timestamp
● posts keys
107907533,SA,Lw,OA,Iw,aA,RA,KA,CQ,Ow,Aw,Hg,Kg
111956832,upgb
121807061,upgb
134028958,l3Ir,iPMq,PcUn,T5Ej,ZQs,kTM,/98F,BFwJ,Oik,uYIB,gv8F
137975373,crgb,SCMU,NXUd,EyQI,244Z,DB4H,Tp0H,Kh8b,gREH,g9kG,o34E,wSYI,u+wF,h40M
142509895,iwM,hgM,CQQ,YQI
154850130,jTE,ciU,2U4,GQkB,4xo,U2QC,/7oB,dRIC,M0gB,bwYk
...
Limitations
● We want to analyze and optimize all cache operations: Get, Set, Eviction
b.RunParallel(func(pb *testing.PB) {
// set up goroutine local state
for pb.Next() {
// execute one iteration of the benchmark
}
})
}
func BenchmarkCache(b *testing.B) {
cb := newBenchmarkCase(b, config{size: 400_000})
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
var br benchmarkResult
for pb.Next() {
log := cb.nextEventLog()
start := time.Now()
cached, missing := cb.cache.Get(log.keys)
br.observeGet(start, cached, missing)
if len(missing) > 0 {
data := lookupData(missing)
start := time.Now()
cb.cache.Set(data)
br.observeSetDuration(start)
}
}
cb.addLocalReports(br)
})
b.ReportMetric(cb.getHitRate(), "hit/op")
b.ReportMetric(cb.getTimePerGet(b), "read-ns/op")
b.ReportMetric(cb.getTimePerSet(b), "write-ns/op")
}
func BenchmarkCache(b *testing.B) { Custom benchmark case
cb := newBenchmarkCase(b, config{size: 400_000}) type to manage benchmark
b.ResetTimer() and collect data
b.RunParallel(func(pb *testing.PB) {
var br benchmarkResult
for pb.Next() {
log := cb.nextEventLog()
start := time.Now()
cached, missing := cb.cache.Get(log.keys)
br.observeGet(start, cached, missing)
if len(missing) > 0 {
data := lookupData(missing)
start := time.Now()
cb.cache.Set(data)
br.observeSetDuration(start)
}
}
cb.addLocalReports(br)
})
b.ReportMetric(cb.getHitRate(), "hit/op")
b.ReportMetric(cb.getTimePerGet(b), "read-ns/op")
b.ReportMetric(cb.getTimePerSet(b), "write-ns/op")
}
func BenchmarkCache(b *testing.B) {
cb := newBenchmarkCase(b, config{size: 400_000})
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
Collect per-goroutine
var br benchmarkResult
for pb.Next() { benchmark measurements
log := cb.nextEventLog()
start := time.Now()
cached, missing := cb.cache.Get(log.keys)
br.observeGet(start, cached, missing)
if len(missing) > 0 {
data := lookupData(missing)
start := time.Now()
cb.cache.Set(data)
br.observeSetDuration(start)
}
}
cb.addLocalReports(br)
})
b.ReportMetric(cb.getHitRate(), "hit/op")
b.ReportMetric(cb.getTimePerGet(b), "read-ns/op")
b.ReportMetric(cb.getTimePerSet(b), "write-ns/op")
}
func BenchmarkCache(b *testing.B) {
cb := newBenchmarkCase(b, config{size: 400_000})
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
var br benchmarkResult
for pb.Next() {
log := cb.nextEventLog()
start := time.Now()
cached, missing := cb.cache.Get(log.keys)
br.observeGet(start, cached, missing)
start := time.Now()
cached, missing := cb.cache.Get(log.keys) We can measure duration of
br.observeGet(start, cached, missing) individual operations
manually
if len(missing) > 0 {
data := lookupData(missing)
start := time.Now()
cb.cache.Set(data)
br.observeSetDuration(start)
}
}
cb.addLocalReports(br)
})
b.ReportMetric(cb.getHitRate(), "hit/op")
b.ReportMetric(cb.getTimePerGet(b), "read-ns/op")
Use b.ReportMetric to
b.ReportMetric(cb.getTimePerSet(b), "write-ns/op") report custom metrics
}
go test -run=^$ -bench=BenchmarkCache -count=10
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x
goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ LFU │ LRU │
│ hit/op │ hit/op vs base │
Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10)
│ LFU │ LRU │
│ read-sec/op │ read-sec/op vs base │
Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10)
│ LFU │ LRU │
│ write-sec/op │ write-sec/op vs base │
Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ LFU │ LRU │
│ hit/op │ hit/op vs base │
Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10)
│ LFU │ LRU │
│ read-sec/op │ read-sec/op vs base │
Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10)
│ LFU │ LRU │
│ write-sec/op │ write-sec/op vs base │
Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ LFU │ LRU │
│ hit/op │ hit/op vs base │
Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10)
│ LFU │ LRU │
│ read-sec/op │ read-sec/op vs base │
Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10)
│ LFU │ LRU │
│ write-sec/op │ write-sec/op vs base │
Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)
Taxonomy of Cache Replacement Policies
Coarse-Grained Fine-Grained
Policies Policies
[1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
Taxonomy of Cache Replacement Policies
Coarse-Grained Fine-Grained
Policies Policies
[1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
Taxonomy of Cache Replacement Policies
Coarse-Grained Fine-Grained
Policies Policies
[1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
Combining
LFU & LRU
LRFU (Least Recently/Least Frequently) type cache struct {
size int
[2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies.
IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.
[2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies.
IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.
[3] https://github.com/dbaarda/DLFUCache
[4] https://minkirri.apana.org.au/wiki/DecayingLFUCacheExpiry
p := float64(config.Size) * config.Weight
cache.decay = (p + 1.0) / p
return cache
}
c.trim()
}
item := &item{
key: key,
value: value,
frequency: 1,
}
c.data[key] = item
heap.Push(&c.heap, item)
}
c.trim()
}
item := &item{
key: key,
value: value,
score: c.incr,
}
c.data[key] = item
heap.Push(&c.heap, item)
}
c.trim()
}
115 Building a Highly Concurrent Cache in Go
DLFU Cache
c.mu.Lock()
defer c.mu.Unlock()
c.mu.Lock()
defer c.mu.Unlock()
c.mu.Lock()
defer c.mu.Unlock()
c.mu.Lock()
defer c.mu.Unlock()
c.mu.Lock()
defer c.mu.Unlock()
}
c.incr *= c.decay
return result, missing
}
goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ LRU │ DLFU │
│ hit/op │ hit/op vs base │
Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10)
│ LRU │ DLFU │
│ read-sec/op │ read-sec/op vs base │
Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10)
│ LRU │ DLFU │
│ write-sec/op │ write-sec/op vs base │
Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ LRU │ DLFU │
│ hit/op │ hit/op vs base │
Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10)
│ LRU │ DLFU │
│ read-sec/op │ read-sec/op vs base │
Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10)
│ LRU │ DLFU │
│ write-sec/op │ write-sec/op vs base │
Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x -cpuprofile=cpu.out > bench
goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ LRU │ DLFU │
│ hit/op │ hit/op vs base │
Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10)
│ LRU │ DLFU │
│ read-sec/op │ read-sec/op vs base │
Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10)
│ LRU │ DLFU │
│ write-sec/op │ write-sec/op vs base │
Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)
Profiling
go tool pprof cpu.out
File: cache.test
Type: cpu
Time: Sep 24, 2023 at 3:04pm (PDT)
Duration: 850.60s, Total samples = 1092.33s (128.42%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
File: cache.test
Type: delay
Time: Sep 24, 2023 at 3:48pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out
File: cache.test
Type: delay
Time: Sep 24, 2023 at 3:48pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list DLFU.*Get
Total: 615.48s
ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get
0 297.55s (flat, cum) 48.34% of Total
. . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) {
. . 75: var missingKeys []string
. . 76: result := make(map[string]V)
. . 77:
. 297.55s 78: c.mu.Lock()
. . 79: defer c.mu.Unlock()
. . 80:
. . 81: for _, key := range keys {
. . 82: item, ok := c.data[key]
. . 83: if ok && !item.expired() {
go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out
File: cache.test
Type: delay
Time: Sep 24, 2023 at 3:48pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list DLFU.*Get
Total: 615.48s
ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get
0 297.55s (flat, cum) 48.34% of Total
. . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) {
. . 75: var missingKeys []string
. . 76: result := make(map[string]V)
. . 77:
. 297.55s 78: c.mu.Lock()
. . 79: defer c.mu.Unlock()
. . 80:
. . 81: for _, key := range keys {
. . 82: item, ok := c.data[key]
. . 83: if ok && !item.expired() {
go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out
File: cache.test
Type: delay
Time: Sep 24, 2023 at 3:48pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list DLFU.*Set
Total: 615.48s
ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Set
0 193.89s (flat, cum) 31.50% of Total
. . 99:func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) {
. 193.89s 100: c.mu.Lock()
. . 101: defer c.mu.Unlock()
. . 102:
. . 103: now := time.Now()
. . 104: for key, value := range items {
. . 105: if ctx.Err() != nil {
func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)
c.mu.Lock()
defer c.mu.Unlock()
}
c.incr *= c.decay Critical Section
return result, missing
}
c.mu.Lock()
defer c.mu.Unlock()
}
c.incr *= c.decay
return result, missing
}
c.incr *= c.decay
goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ V1 │ V2 │
│ hit/op │ hit/op vs base │
Cache-16 70.94% ± 0% 70.30% ± 0% -0.89% (p=0.000 n=10)
│ V1 │ V2 │
│ read-sec/op │ read-sec/op vs base │
Cache-16 4.34ms ± 54% 1.43ms ± 26% -67.13% (p=0.001 n=10)
│ V1 │ V2 │
│ write-sec/op │ write-sec/op vs base │
Cache-16 2.43ms ± 62% 574.3µs ± 25% -76.36% (p=0.000 n=10)
In Production
In Production
In Production
if !liveconfig.Sample("cache.read_rate") {
return
}
cached, missingIDs := cache.Get(keys)
In Production: Timeout for Cache Operations
if !liveconfig.Sample("cache.read_rate") {
return
}
go func() {
cached, missingIDs = localCache.Get(keys)
}()
In Production: Timeout for Cache Operations
if !liveconfig.Sample("cache.read_rate") {
return
}
<-ctx.Done()
// timeout: return all keys as missing and let remote cache handle it
if ctxCache.Err() == context.DeadlineExceeded {
return map[string]T{}, keys
}
In Production: Timeout for Cache Operations
if !liveconfig.Sample("cache.read_rate") {
return
}
<-ctx.Done()
// timeout: return all keys as missing and let remote cache handle it
if ctxCache.Err() == context.DeadlineExceeded {
return map[string]T{}, keys
}
In Production: Timeout for Cache Operations
func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)
c.mu.Lock()
if item, ok := c.data[key]; ok && !item.expired() {
result[key] = value
item.score += c.incr
c.heap.update(item, item.score)
} else {
missing = append(missing, key)
}
c.mu.Unlock()
}
c.mu.Lock()
c.incr *= c.decay
c.mu.Unlock()
return result, missing
}
In Production: Timeout for Cache Operations
func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
result := make(map[string]string)
missing := make([]string, 0)
● I recommend to frequently dive into the standard library source code, for
example for sync.Map we can see this implementation is much more intricate:
func (m *Map) Load(key any) (value any, ok bool) {
read := m.loadReadOnly()
e, ok := read.m[key]
if !ok && read.amended {
m.mu.Lock()
// Avoid reporting a spurious miss if m.dirty got promoted while we were
// blocked on m.mu. (If further loads of the same key will not miss, it's
// not worth copying the dirty map for this key.)
read = m.loadReadOnly()
e, ok = read.m[key]
The Map type is specialized. Most code should use a plain Go map instead, with separate
locking or coordination, for better type safety and to make it easier to maintain other
invariants along with the map content.
(2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In
these two cases, use of a Map may significantly reduce lock contention compared to a Go
map paired with a separate Mutex or RWMutex.
158 Building a Highly Concurrent Cache in Go
sync.Map
Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple
goroutines without additional locking or coordination. Loads, stores, and deletes run in
amortized constant time.
The Map type is specialized. Most code should use a plain Go map instead, with separate
locking or coordination, for better type safety and to make it easier to maintain other
invariants along with the map content.
(2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In
these two cases, use of a Map may significantly reduce lock contention compared to a Go
map paired with a separate Mutex or RWMutex.
159 Building a Highly Concurrent Cache in Go
sync.Map
Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple
goroutines without additional locking or coordination. Loads, stores, and deletes run in
amortized constant time.
The Map type is specialized. Most code should use a plain Go map instead, with separate
locking or coordination, for better type safety and to make it easier to maintain other
invariants along with the map content.
(2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In
these two cases, use of a Map may significantly reduce lock contention compared to a Go
map paired with a separate Mutex or RWMutex.
160 Building a Highly Concurrent Cache in Go
xsync.Map
CPU
L1 L1 L1 L1
L2 L2 L2 L2
L3
● Not one memory location is copied to the CPU cache, but a cache line.
Main Memory
var x var y
L1 L1
Main Memory
var x var y
L1 L1
Main Memory
var x var y
L1 L1
var x var y
E
Main Memory
var x var y
L1 L1
var x var y
E
Main Memory
var x var y
L1 L1
Main Memory
var x var y
Write to variable x
L1 L1
Main Memory
var x var y
Main Memory
var x var y
L1 L1
var x var y
M
Main Memory
var x var y
var x var y
M
Main Memory
var x var y
Coherence Write-Back
L1 L1
var x var y
M
Main Memory
var x var y
L1 L1
var x var y
M
Main Memory
var x var y
var x var y
M
Main Memory
var x var y
Coherence Write-Back
L1 L1
var x var y
M
Main Memory
var x var y
L1 L1
● Reducing the need for cache coherence will make for faster Go applications.
● Perform cache eviction in a goroutine: collect all entries and sort them.
} else {
missing = append(missing, key)
}
c.incr *= c.decay
goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ V2 │ V3 │
│ hit/op │ hit/op vs base │
Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10)
│ V2 │ V3 │
│ read-sec/op │ read-sec/op vs base │
Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10)
│ V2 │ V3 │
│ trim-sec/op │ trim-sec/op vs base │
Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10)
│ V2 │ V3 │
│ write-sec/op │ write-sec/op vs base │
Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ V2 │ V3 │
│ hit/op │ hit/op vs base │
Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10)
│ V2 │ V3 │
│ read-sec/op │ read-sec/op vs base │
Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10)
│ V2 │ V3 │
│ trim-sec/op │ trim-sec/op vs base │
Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10)
│ V2 │ V3 │
│ write-sec/op │ write-sec/op vs base │
Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)
go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ V2 │ V3 │
│ hit/op │ hit/op vs base │
Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10)
│ V2 │ V3 │
│ read-sec/op │ read-sec/op vs base │
Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10)
│ V2 │ V3 │
│ trim-sec/op │ trim-sec/op vs base │
Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10)
│ V2 │ V3 │
│ write-sec/op │ write-sec/op vs base │
Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)
go tool pprof cpu.out
File: cache.test
Type: cpu
Time: Sep 26, 2023 at 1:16pm (PDT)
Duration: 42.14s, Total samples = 81.68s (193.83%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
go tool pprof cpu.out
File: cache.test
Type: cpu
Time: Sep 26, 2023 at 1:16pm (PDT)
Duration: 42.14s, Total samples = 81.68s (193.83%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list trim
Total: 81.68s
ROUTINE ======================== github.com/konradreiche/cache/dlfu/v3.(*DLFUCache[go.shape.string]).trim
20ms 37.23s (flat, cum) 45.58% of Total
. . 197:func (c *DLFUCache[V]) Trim() {
. 20ms 198: size := c.data.Size()
. 10ms 199: if c.data.Size() <= c.size {
. . 200: return
. . 201: }
. . 202:
. 80ms 203: items := make(*items[V], 0, size)
. 6.82s 204: c.data.Range(func(key string, value *item[V]) bool {
. . 205: items = append(items, value)
. . 206: return true
. . 207: })
. 26.98s 208: sort.Sort(items)
. . 209:
10ms 10ms 210: for i := 0; i < len(items)-c.size; i++ {
10ms 680ms 211: key := items[i].key.Load()
. 2.63s 212: c.data.Delete(key)
. . 213: }
. . 214:}
go tool pprof cpu.out
File: cache.test
Type: cpu
Time: Sep 26, 2023 at 1:16pm (PDT)
Duration: 42.14s, Total samples = 81.68s (193.83%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) list trim
Total: 81.68s
ROUTINE ======================== github.com/konradreiche/cache/dlfu/v3.(*DLFUCache[go.shape.string]).trim
20ms 37.23s (flat, cum) 45.58% of Total
. . 197:func (c *DLFUCache[V]) Trim() {
. 20ms 198: size := c.data.Size()
. 10ms 199: if c.data.Size() <= c.size {
. . 200: return
. . 201: }
. . 202:
. 80ms 203: items := make(*items[V], 0, size)
. 6.82s 204: c.data.Range(func(key string, value *item[V]) bool {
. . 205: items = append(items, value)
. . 206: return true
. . 207: })
. 26.98s 208: sort.Sort(items)
. . 209:
10ms 10ms 210: for i := 0; i < len(items)-c.size; i++ {
10ms 680ms 211: key := items[i].key.Load()
. 2.63s 212: c.data.Delete(key)
. . 213: }
. . 214:}
Faster Eviction
goos: linux
goarch: amd64
pkg: github.com/konradreiche/cache
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ V4 │ V5 │
│ hit/op │ hit/op vs base │
Cache-16 76.44% ± 1% 74.84% ± 1% -2.09% (p=0.001 n=10)
│ V4 │ V5 │
│ read-sec/op │ read-sec/op vs base │
Cache-16 477.5µ ± 40% 358.4µ ± 44% ~ (p=0.529 n=10)
│ V4 │ V5 │
│ trim-sec/op │ trim-sec/op vs base │
Cache-16 463.3m ± 54% 129.1m ± 85% -72.14% (p=0.002 n=10)
│ V4 │ V5 │
│ write-sec/op │ write-sec/op vs base │
Cache-16 193.2µ ± 53% 133.6µ ± 40% ~ (p=0.280 n=10)
Summary
● DLFU (Decaying Least Frequently Used): like LFU but with exponential decay on the
cache entry’s reference count.
Konrad Reiche
@konradreiche