-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase value threshold from 1 KB to 1 MB #1664
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r1.
Reviewable status: complete! all files reviewed, all discussions resolved
The write amplification with value log can be unpredictably high. It's better to use value log only for really big values, and keep as many values as possible within the LSM tree.
Cherry-pick of #1664. The write amplification with value log can be unpredictably high. It's better to use value log only for really big values, and keep as many values as possible within the LSM tree. Co-authored-by: Ibrahim Jarif <[email protected]>
But why??????? |
Hi @manishrjain, I find this change a bit confusing :) Could you please comment on that? Essentially, I would like to know the reason of this change. Also, would I experience any negative consequences if setting this threshold to zero? |
Yeah. The original idea for Badger was relying upon the WiscKey paper -- that values can be stored separately. In practice what we found was that the value garbage collection is a real storage problem, in fast-updating datasets. The paper didn't elaborate on that much -- in production, that's a major issue -- in other words, disk amplification. If X amount of data takes 3-10X amount of disk space, that's a problem. So, having much larger value log thresholds allowed us to only store the most expensive values in the value log, while most other things would stay within the LSM tree. LSM tree has frequent compactions, so things can stay bounded (there's a separate concern here about write amp). Overall, Outserv, the fork of Dgraph that I run, no longer uses a value log. In fact, Dgraph 21.12 basically stopped using value log, or even WAL from Badger entirely. If you happen to deal with large values and don't overwrite keys much, then a value log makes a lot of sense (say long-term file storage, where each key corresponds to a file). But, not so much when keys are being overwritten repeatedly. P.S. Note that I'm no longer with Dgraph Labs. I maintain a fork of Badger here: https://github.com/outcaste-io/badger |
Got it, thank you very much for your reply! |
This PR increases the default value of ValueThreshold from 1 KB to 1 MB.
This change is