Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase value threshold from 1 KB to 1 MB #1664

Merged
merged 2 commits into from
Feb 9, 2021
Merged

Conversation

jarifibrahim
Copy link
Contributor

@jarifibrahim jarifibrahim commented Feb 9, 2021

This PR increases the default value of ValueThreshold from 1 KB to 1 MB.


This change is Reviewable

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 6 of 6 files at r1.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@manishrjain manishrjain merged commit 6c35ad6 into master Feb 9, 2021
@manishrjain manishrjain deleted the ibrahim/badger-1mb branch February 9, 2021 20:11
danielmai pushed a commit that referenced this pull request Feb 10, 2021
The write amplification with value log can be unpredictably high. It's better to use value log only for really big values, and keep as many values as possible within the LSM tree.
danielmai added a commit that referenced this pull request Feb 10, 2021
Cherry-pick of #1664.

The write amplification with value log can be unpredictably high. It's better to use value log only for really big values, and keep as many values as possible within the LSM tree.

Co-authored-by: Ibrahim Jarif <[email protected]>
NamanJain8 pushed a commit that referenced this pull request Feb 16, 2021
This updates the godoc for the following:

ValueThreshold based on #1664
stream.NumGo based on #1593
@funny-falcon
Copy link

But why???????
No description in PR, no description in commit message.
Why the hell you off stick main Badger purpose: store large values in a log?

@strokovok
Copy link

Hi @manishrjain, I find this change a bit confusing :)
As @funny-falcon has already mentioned, it kinda contradicts to one of the key ideas behind Badger.
Also as far as I could understood, Badger's LSM tree is intended to be fully stored in RAM (correct me if I'm wrong) - which unfortunately breaks my use-case (50M entries * 200Kb per value = 9.3Tb).

Could you please comment on that? Essentially, I would like to know the reason of this change. Also, would I experience any negative consequences if setting this threshold to zero?

@manishrjain
Copy link
Contributor

manishrjain commented Oct 10, 2022

Yeah. The original idea for Badger was relying upon the WiscKey paper -- that values can be stored separately. In practice what we found was that the value garbage collection is a real storage problem, in fast-updating datasets. The paper didn't elaborate on that much -- in production, that's a major issue -- in other words, disk amplification. If X amount of data takes 3-10X amount of disk space, that's a problem.

So, having much larger value log thresholds allowed us to only store the most expensive values in the value log, while most other things would stay within the LSM tree. LSM tree has frequent compactions, so things can stay bounded (there's a separate concern here about write amp).

Overall, Outserv, the fork of Dgraph that I run, no longer uses a value log. In fact, Dgraph 21.12 basically stopped using value log, or even WAL from Badger entirely.

If you happen to deal with large values and don't overwrite keys much, then a value log makes a lot of sense (say long-term file storage, where each key corresponds to a file). But, not so much when keys are being overwritten repeatedly.

P.S. Note that I'm no longer with Dgraph Labs. I maintain a fork of Badger here: https://github.com/outcaste-io/badger

@strokovok
Copy link

Got it, thank you very much for your reply!
My use case doesn't imply entries overwriting at all, so perhaps I'll just lower down this setting :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants