perf: investigate wal performance degradation #183

Ryanfsdf · 2019-07-16T18:22:15Z

Enabling the WAL severely impacts the speed at which flushes and compactions occur (From my experimentation, roughly ~4-8x). I would have expected some sort of performance degradation though not to the extent that I saw in my experiment.

This initially came up during my implementation of #179. When writes were stalled (which means WAL wasn't written to), background compactions temporarily sped up by ~4-8x.

I further tested this by adding a few lines to compaction.go in runCompaction() after the iteration:

if c.flushing == nil {
	fmt.Printf("bytes compacted: %d\n", c.bytesIterated)
} else {
        fmt.Printf("bytes flushed: %d\n", c.bytesIterated)
}

and updating the sync.go benchmark to stop writing for a few seconds every couple of minutes with the below code added at the front of the for loop in sync.go:

if time.Now().Minute() % 2 == 0 && time.Now().Second() < 5 {
        time.Sleep(1 * time.Second)
}

and running:

pebble sync -c 1000 -d 10m -w bench -m 20000

What I observed was that the bytes compacted: x would print ~4-8x the number of bytes compacted/flushed per second during the time when writes were stopped compared to when writes were running.

Another experiment I ran was to simply run the sync benchmark with the WAL disabled. The throughput difference was also ~4-8x. I found this by turning up the concurrency level until turning it up no longer affected the throughput. The benchmark with WAL enabled would cap at ~7MB/s and the benchmark with WAL disabled would cap at ~35MB/s (c=1000 for WAL enabled, c=10 for WAL disabled). Once again, I observed that flushes and compaction occurred much quicker with the WAL disabled.

I have yet to try this on anything but my Macbook, I'll be testing this out on other machines as well.

I suspect that the results may be due to:

How my Macbook SSD behaves
Implementation bottleneck somewhere in Pebble
Limitation of writing the WAL to the the same disk as the entries
Bad experimental setup

Further investigation will be required.

The text was updated successfully, but these errors were encountered:

petermattis · 2019-07-18T20:09:10Z

I believe what is happening here is due to the Macbook SSD which has extremely poor performance when syncing frequently. On a c5d.4xlarge AWS instance I see almost no difference between compaction throughput when the WAL is enabled vs disabled. In both cases it is 150-200 MB/s, which is what I'd expect from an SSD. On my Macbook, compaction throughput is 50-100 MB/s with the WAL disabled, but only 10 MB/s with the WAL enabled. But if I leave the WAL enabled and disable syncing, throughput rises to 50-100 MB/s again. I don't think there is anything to be done here.

Note that RocksDB does not do syncing properly on Darwin. There is a special system call in order to actually sync data to the media (fsync does the wrong thing). See golang/go#26650 and https://github.com/golang/go/blob/a3b01440fef3d2833909f6651455924a1c86d192/src/internal/poll/fd_fsync_darwin.go#L12-L23.

One takeaway is that we shouldn't be using Macbooks for write performance teseting.

Ryanfsdf self-assigned this Jul 16, 2019

Ryanfsdf closed this as completed Jul 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: investigate wal performance degradation #183

perf: investigate wal performance degradation #183

Ryanfsdf commented Jul 16, 2019

petermattis commented Jul 18, 2019

perf: investigate wal performance degradation #183

perf: investigate wal performance degradation #183

Comments

Ryanfsdf commented Jul 16, 2019

petermattis commented Jul 18, 2019