Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fully disabling the bloom filter #1319

Merged
merged 3 commits into from
Jul 27, 2020

Conversation

damz
Copy link
Contributor

@damz damz commented Apr 28, 2020

This PR allows the bloom filter to be fully disabled, by setting BloomFalsePositive to zero.

It also fixes the description of the Table.DoesNotHave which is manifestly wrong.

We have a use case where we use badger as an append only database with values fully stored in the LSM tree. In this use case, we do not benefit from the bloom filter because we do not do any single-key queries, and it makes sense to just disable it. (But note that purging the value log with reasonable performance in this configuration requires #1206.)


This change is Reviewable

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @damz .

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @damz, @jarifibrahim, and @manishrjain)


table/table.go, line 384 at r1 (raw file):

	t.blockIndex = index.Offsets

	if t.indexLen > 0 && t.opt.LoadBloomsOnOpen {

This won't work because the index len stores the length of block index (used for binary searching the block) and the bloom filter. This length will never be zero.

Instead, we could check for the length of index.Bloomfilter. Something like

// Read table index
data := t.Read(...)
index := proto.Unmarshal(data)

if len(index.Bloomfilter) > 0 {
    bf, err := z.JSONUnmarshal(index.Bloomfilter)
}

@damz
Copy link
Contributor Author

damz commented Apr 30, 2020

This won't work because the index len stores the length of block index (used for binary searching the block) and the bloom filter. This length will never be zero.

Yes, I somehow pushed an older (broken) version of this patch from my stack. Apologies. The new version uses computes a hasBloomFilter property in readIndex() based on len(index.Bloomfilter) .

Copy link
Contributor Author

@damz damz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @jarifibrahim, and @manishrjain)


table/table.go, line 384 at r1 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

This won't work because the index len stores the length of block index (used for binary searching the block) and the bloom filter. This length will never be zero.

Instead, we could check for the length of index.Bloomfilter. Something like

// Read table index
data := t.Read(...)
index := proto.Unmarshal(data)

if len(index.Bloomfilter) > 0 {
    bf, err := z.JSONUnmarshal(index.Bloomfilter)
}

That was fixed in the new version.

@jarifibrahim
Copy link
Contributor

Hi @damz, the PR was missing tests and I've added some.

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @damz 🎉 . I have one minor comment. I'll get this reviewed by @manishrjain

Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @damz, and @manishrjain)


table/builder.go, line 377 at r3 (raw file):

	if b.opt.BloomFalsePositive > 0 {
		bf := z.NewBloomFilter(float64(len(b.keyHashes)), b.opt.BloomFalsePositive)
		for _, h := range b.keyHashes {

I see these keyHashes are being used only while creating bloom filters. We can actually skip creating this slice and adding key hashes to it.
They're added here

b.keyHashes = append(b.keyHashes, farm.Fingerprint64(y.ParseKey(key)))

We could also avoid this

keyHashes: make([]uint64, 0, 1024), // Avoid some malloc calls.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jarifibrahim Can you check if there's any existing mechanism to achieve a similar effect. I'd like to decrease the number of code changes here.

If not, :lgtm: Thanks for the PR!

Reviewed 2 of 3 files at r1, 1 of 1 files at r2, 2 of 2 files at r3.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ashish-goswami and @damz)

@stale stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label Jun 13, 2020
@jarifibrahim jarifibrahim removed the status/stale The issue hasn't had activity for a while and it's marked for closing. label Jun 15, 2020
@dgraph-io dgraph-io deleted a comment from stale bot Jun 15, 2020
@stale stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label Jul 15, 2020
@damz
Copy link
Contributor Author

damz commented Jul 15, 2020

This is still relevant (but needs to be rebased).

@stale stale bot removed the status/stale The issue hasn't had activity for a while and it's marked for closing. label Jul 15, 2020
Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bloom filter with 100% false-positive rate will still take up 1 MB (1048576 bytes). This PR is good to go. I'll resolve the conflicts and merge this PR.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ashish-goswami and @damz)

@dgraph-io dgraph-io deleted a comment from stale bot Jul 16, 2020
@jarifibrahim
Copy link
Contributor

@NamanJain8 review, please.

Copy link
Contributor

@NamanJain8 NamanJain8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@jarifibrahim jarifibrahim merged commit 1eaec57 into dgraph-io:master Jul 27, 2020
jarifibrahim pushed a commit that referenced this pull request Jul 29, 2020
This PR allows the bloom filter to be fully disabled, by setting
`BloomFalsePositive` to zero.

It also fixes the description of the `Table.DoesNotHave` which is manifestly
wrong.

Co-authored-by: Ibrahim Jarif <[email protected]>
jarifibrahim pushed a commit that referenced this pull request Oct 2, 2020
This PR allows the bloom filter to be fully disabled, by setting
`BloomFalsePositive` to zero.

It also fixes the description of the `Table.DoesNotHave` which is manifestly
wrong.

Co-authored-by: Ibrahim Jarif <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants