-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support fully disabling the bloom filter #1319
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @damz .
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @damz, @jarifibrahim, and @manishrjain)
table/table.go, line 384 at r1 (raw file):
t.blockIndex = index.Offsets if t.indexLen > 0 && t.opt.LoadBloomsOnOpen {
This won't work because the index len
stores the length of block index
(used for binary searching the block) and the bloom filter
. This length will never be zero.
Instead, we could check for the length of index.Bloomfilter. Something like
// Read table index
data := t.Read(...)
index := proto.Unmarshal(data)
if len(index.Bloomfilter) > 0 {
bf, err := z.JSONUnmarshal(index.Bloomfilter)
}
297382f
to
4c8105d
Compare
Yes, I somehow pushed an older (broken) version of this patch from my stack. Apologies. The new version uses computes a |
4c8105d
to
3ce8b1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @jarifibrahim, and @manishrjain)
table/table.go, line 384 at r1 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
This won't work because the
index len
stores the length ofblock index
(used for binary searching the block) and thebloom filter
. This length will never be zero.Instead, we could check for the length of index.Bloomfilter. Something like
// Read table index data := t.Read(...) index := proto.Unmarshal(data) if len(index.Bloomfilter) > 0 { bf, err := z.JSONUnmarshal(index.Bloomfilter) }
That was fixed in the new version.
Hi @damz, the PR was missing tests and I've added some. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @damz 🎉 . I have one minor comment. I'll get this reviewed by @manishrjain
Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @damz, and @manishrjain)
table/builder.go, line 377 at r3 (raw file):
if b.opt.BloomFalsePositive > 0 { bf := z.NewBloomFilter(float64(len(b.keyHashes)), b.opt.BloomFalsePositive) for _, h := range b.keyHashes {
I see these keyHashes
are being used only while creating bloom filters. We can actually skip creating this slice and adding key hashes
to it.
They're added here
Line 193 in af22dfd
b.keyHashes = append(b.keyHashes, farm.Fingerprint64(y.ParseKey(key))) |
We could also avoid this
Line 101 in af22dfd
keyHashes: make([]uint64, 0, 1024), // Avoid some malloc calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jarifibrahim Can you check if there's any existing mechanism to achieve a similar effect. I'd like to decrease the number of code changes here.
Reviewed 2 of 3 files at r1, 1 of 1 files at r2, 2 of 2 files at r3.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ashish-goswami and @damz)
This is still relevant (but needs to be rebased). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bloom filter with 100% false-positive rate will still take up 1 MB (1048576 bytes). This PR is good to go. I'll resolve the conflicts and merge this PR.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ashish-goswami and @damz)
Conflicts: table/table.go
@NamanJain8 review, please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
This PR allows the bloom filter to be fully disabled, by setting `BloomFalsePositive` to zero. It also fixes the description of the `Table.DoesNotHave` which is manifestly wrong. Co-authored-by: Ibrahim Jarif <[email protected]>
This PR allows the bloom filter to be fully disabled, by setting `BloomFalsePositive` to zero. It also fixes the description of the `Table.DoesNotHave` which is manifestly wrong. Co-authored-by: Ibrahim Jarif <[email protected]>
This PR allows the bloom filter to be fully disabled, by setting
BloomFalsePositive
to zero.It also fixes the description of the
Table.DoesNotHave
which is manifestly wrong.We have a use case where we use badger as an append only database with values fully stored in the LSM tree. In this use case, we do not benefit from the bloom filter because we do not do any single-key queries, and it makes sense to just disable it. (But note that purging the value log with reasonable performance in this configuration requires #1206.)
This change is