-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't os.File.Sync() on mac CAS items #143
Conversation
b8baad2
to
721c593
Compare
os.File.Sync() was inneffective on mac in go prior to 1.12. In 1.12 onwards it is effective, but slow: golang/go#26650 This change disables Sync for CAS uploads on mac, because we can (potentially) verify this later. Let's see if this helps performance on mac: buchgr#67 (while being less risky than buchgr#68).
721c593
to
12603b2
Compare
IIRC we found that os.File.Sync() was really slow on macs with spinning disks. I believe we found no issues on macs with SSDs. If we merge this patch we'll have different persistence guarantees on mac than on all other platforms. I am not a fan of this change. |
I compared performance on linux SSD vs HDD a while ago, and basically concluded that you really should use SSD for bazel-remote. With that in mind maybe we should just document this and leave the current behaviour? |
We ran into the performance issue when we used bazel-remote on a Mac Pro (2013) with Arch Linux as the host OS. The Mac Pro used the built-in SSD that is supposedly pretty fast, but had issues with frequent fsync() calls. I think the answer why it showed so bad performance lies in the flash technology used: If you fsync() after writing each file, the SSD has to guarantee that even when a power outage happens, these files will not be lost. There are two ways to implement this: 1) The SSD can use some kind of power-loss protection like a battery/capacitor-backed write cache. 2) The SSD must actually do a full erase/write cycle on each fsync. Professional SSDs usual provide the former, cheap SSD the latter. With power-loss protection, fsync() shouldn't cause much performance degradation, without, it will a) kill the SSD quickly (due to write amplification - if you fsync a 60 kByte file, it still has to rewrite an entire flash block, which is often 2 MB large) and b) be super slow. My guess is that the Mac Pro SSD also didn't have power-loss protection and thus had the performance problems. For your cache, I think you can reasonably do two things: Either continue to do fsync() after writing each file and tell users that they should preferably use professional / enterprise SSDs with power-loss protection as storage, or come up with some clever trick that only requires you to fsync() once in a while and allows you to get to a known good state when the server starts after a power-loss event. |
I don't think "fsync() once in a while" would help SSD longevity. Since it's per-file, it would result in the same number of calls/operations when deferred as if you did it immediately after each PUT if I understand correctly. An alternative "don't kill my SSD" mode that might be worth trying would be to write {data, hash} items to disk for the ActionCache without fsync(), and then use that hash to verify the item before serving it to clients. |
So in order to do this efficiently we would need to change the on-disk layout to store all logical files in a (few) single files on disk. Besides having to offer a migration path this would come with a downside of being more restrained on possible garbage collection algorithms and having to deal with fragmentation. Do we know how big of a problem this? |
If you store two files per cache entry, then I think you would waste a lot of space unless you use a filesystem that is optimized for small files (the hash file would be much smaller than the common 4k filesystem block size). Alternatively, you could store both data items in a single file, serialized using protobuf or similar. This would still require a migration path, but still only produce one file on disk per ActionCache entry, and I don't think this should limit our future garbage collection options.
Re bazel-remote speed on mac pro running linux- I assume this is far from a common use case, and possibly one that we can ignore. Re SSD write amplification, I have not had trouble with this personally, using mostly high end consumer SSDs for about a year on a medium/large codebase (chromium). Either way, writing the ActionResult hash to disk would provide an level of data integrity check that we currently lack. |
Sorry if I wasn't clear. What I was suggesting is to have one large append-only file to store all data. I believe that's a common pattern to avoid the problems with fsync() that you described. |
Yes, but that would be a pretty large change that definitely needs data before committing to. Whereas writing data+hash for the ActionCache would be a much smaller change, potentially faster and still safe. I could put together a test branch, but the only mac I have available for testing is a (fairly old) laptop. |
Agreed.
That sounds worth a try. |
Closing this issue. I'll follow up with an ActionCache data+hash experiment PR. |
https://github.com/envoyproxy/envoy/tree/master/bazel#setup-local-cache |
That doc blames docker- I will try get some performance numbers with docker in my WiP. |
My guess is that it docker on mac just amplifies the problem. |
os.File.Sync() was inneffective on mac in go prior to 1.12. In 1.12 onwards
it is effective, but slow: golang/go#26650
This change disables Sync for CAS uploads on mac, because we can (potentially)
verify this later. Let's see if this helps performance on mac: #67 (while being less risky than #68).