Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't os.File.Sync() on mac CAS items #143

Closed
wants to merge 1 commit into from

Conversation

mostynb
Copy link
Collaborator

@mostynb mostynb commented Dec 19, 2019

os.File.Sync() was inneffective on mac in go prior to 1.12. In 1.12 onwards
it is effective, but slow: golang/go#26650

This change disables Sync for CAS uploads on mac, because we can (potentially)
verify this later. Let's see if this helps performance on mac: #67 (while being less risky than #68).

@mostynb
Copy link
Collaborator Author

mostynb commented Dec 19, 2019

@philwo: does this help performance? I don't have a mac pro to test with.

@ob: FYI, maybe you would like to benchmark this too.

os.File.Sync() was inneffective on mac in go prior to 1.12. In 1.12 onwards
it is effective, but slow: golang/go#26650

This change disables Sync for CAS uploads on mac, because we can (potentially)
verify this later. Let's see if this helps performance on mac: buchgr#67 (while
being less risky than buchgr#68).
@buchgr
Copy link
Owner

buchgr commented Jan 13, 2020

IIRC we found that os.File.Sync() was really slow on macs with spinning disks. I believe we found no issues on macs with SSDs.

If we merge this patch we'll have different persistence guarantees on mac than on all other platforms. I am not a fan of this change.

@mostynb
Copy link
Collaborator Author

mostynb commented Jan 13, 2020

I compared performance on linux SSD vs HDD a while ago, and basically concluded that you really should use SSD for bazel-remote. With that in mind maybe we should just document this and leave the current behaviour?

@philwo
Copy link
Contributor

philwo commented Jan 13, 2020

We ran into the performance issue when we used bazel-remote on a Mac Pro (2013) with Arch Linux as the host OS. The Mac Pro used the built-in SSD that is supposedly pretty fast, but had issues with frequent fsync() calls.

I think the answer why it showed so bad performance lies in the flash technology used: If you fsync() after writing each file, the SSD has to guarantee that even when a power outage happens, these files will not be lost. There are two ways to implement this: 1) The SSD can use some kind of power-loss protection like a battery/capacitor-backed write cache. 2) The SSD must actually do a full erase/write cycle on each fsync. Professional SSDs usual provide the former, cheap SSD the latter. With power-loss protection, fsync() shouldn't cause much performance degradation, without, it will a) kill the SSD quickly (due to write amplification - if you fsync a 60 kByte file, it still has to rewrite an entire flash block, which is often 2 MB large) and b) be super slow.

My guess is that the Mac Pro SSD also didn't have power-loss protection and thus had the performance problems.

For your cache, I think you can reasonably do two things: Either continue to do fsync() after writing each file and tell users that they should preferably use professional / enterprise SSDs with power-loss protection as storage, or come up with some clever trick that only requires you to fsync() once in a while and allows you to get to a known good state when the server starts after a power-loss event.

@mostynb
Copy link
Collaborator Author

mostynb commented Jan 13, 2020

I don't think "fsync() once in a while" would help SSD longevity. Since it's per-file, it would result in the same number of calls/operations when deferred as if you did it immediately after each PUT if I understand correctly.

An alternative "don't kill my SSD" mode that might be worth trying would be to write {data, hash} items to disk for the ActionCache without fsync(), and then use that hash to verify the item before serving it to clients.

@buchgr
Copy link
Owner

buchgr commented Jan 13, 2020

I don't think "fsync() once in a while" would help SSD longevity. Since it's per-file, it would result in the same number of calls/operations when deferred as if you did it immediately after each PUT if I understand correctly.

So in order to do this efficiently we would need to change the on-disk layout to store all logical files in a (few) single files on disk. Besides having to offer a migration path this would come with a downside of being more restrained on possible garbage collection algorithms and having to deal with fragmentation.

Do we know how big of a problem this?

@mostynb
Copy link
Collaborator Author

mostynb commented Jan 13, 2020

So in order to do this efficiently we would need to change the on-disk layout to store all logical files in a (few) single files on disk. Besides having to offer a migration path this would come with a downside of being more restrained on possible garbage collection algorithms and having to deal with fragmentation.

If you store two files per cache entry, then I think you would waste a lot of space unless you use a filesystem that is optimized for small files (the hash file would be much smaller than the common 4k filesystem block size).

Alternatively, you could store both data items in a single file, serialized using protobuf or similar. This would still require a migration path, but still only produce one file on disk per ActionCache entry, and I don't think this should limit our future garbage collection options.

Do we know how big of a problem this?

Re bazel-remote speed on mac pro running linux- I assume this is far from a common use case, and possibly one that we can ignore.

Re SSD write amplification, I have not had trouble with this personally, using mostly high end consumer SSDs for about a year on a medium/large codebase (chromium).

Either way, writing the ActionResult hash to disk would provide an level of data integrity check that we currently lack.

@buchgr
Copy link
Owner

buchgr commented Jan 13, 2020

If you store two files per cache entry, then I think you would waste a lot of space unless you use a filesystem that is optimized for small files (the hash file would be much smaller than the common 4k filesystem block size).

Sorry if I wasn't clear. What I was suggesting is to have one large append-only file to store all data. I believe that's a common pattern to avoid the problems with fsync() that you described.

@mostynb
Copy link
Collaborator Author

mostynb commented Jan 13, 2020

Sorry if I wasn't clear. What I was suggesting is to have one large append-only file to store all data. I believe that's a common pattern to avoid the problems with fsync() that you described.

Yes, but that would be a pretty large change that definitely needs data before committing to.

Whereas writing data+hash for the ActionCache would be a much smaller change, potentially faster and still safe. I could put together a test branch, but the only mac I have available for testing is a (fairly old) laptop.

@buchgr
Copy link
Owner

buchgr commented Jan 14, 2020

Yes, but that would be a pretty large change that definitely needs data before committing to.

Agreed.

Whereas writing data+hash for the ActionCache would be a much smaller change, potentially faster and still safe. I could put together a test branch, but the only mac I have available for testing is a (fairly old) laptop.

That sounds worth a try.

@mostynb
Copy link
Collaborator Author

mostynb commented Jan 14, 2020

Closing this issue. I'll follow up with an ActionCache data+hash experiment PR.

@mostynb mostynb closed this Jan 14, 2020
@buchgr
Copy link
Owner

buchgr commented Jan 15, 2020

  1. bazel-remote is apparently recommended by envoy
  2. They have a disclaimer that it's slow on macOS. So apparently it's more people noticing.

https://github.com/envoyproxy/envoy/tree/master/bazel#setup-local-cache

@mostynb
Copy link
Collaborator Author

mostynb commented Jan 15, 2020

That doc blames docker- I will try get some performance numbers with docker in my WiP.

@buchgr
Copy link
Owner

buchgr commented Jan 15, 2020

My guess is that it docker on mac just amplifies the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants