Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Iterator Performance of Seeking with Prefix #1719

Closed

Conversation

zzyalbert
Copy link
Contributor

@zzyalbert zzyalbert commented Jul 2, 2021

When I was trying to change the db engine to badgerdb in some of my projects, I found the iterator Seek with prefix was pretty slow in the following situation:

  • lots of keys we were seeking with prefix didn't exist
  • lots of keys have lots of versions

Then I use pprof to found out the iterator was still running parseItem even if the current key was not match the prefix.
image

So I fix this by skipping the parseItem process when the current key is not match the prefix.


This change is Reviewable

darkn3rd and others added 23 commits April 15, 2021 08:28
zstd is not set by default even when cgo is enabled.
Add a Builder type in skiplist package which can be used to insert
sorted keys efficiently. Add a test and benchmark for it.
This change makes the skiplist grow for the case of sorted 
skiplist builder. The normal skiplist still cannot grow. 
Note: The growing skiplist is not thread safe.

Co-authored-by: Ahsan Barkati <[email protected]>
…aph-io#1696)

In Dgraph, we already use Raft write-ahead log. Also, when we commit transactions, we update tens of thousands of keys in one go. To optimize this write path, this PR introduces a way to directly hand over Skiplist to Badger, short circuiting Badger's Value Log and WAL.

This feature allows Dgraph to generate Skiplists while processing mutations and just hand them over to Badger during commits. It also accepts a callback which can be run when Skiplist is written to disk. This is useful for determining when to create a snapshot in Dgraph.
…isher (dgraph-io#1697)

When a skip-list is handed over to badger we should also send the
entries in skiplist to the publisher so that all the subscribers get notified.
This PR adds DropPrefixNonBlocking and DropPrefixBlocking API that can be used to logically delete the data for specified prefixes.
DropPrefix now makes decision based on badger option AllowStopTheWorld whose default is to use DropPrefixBlocking.
With DropPrefixNonBlocking the data would not be cleared from the LSM tree immediately. It would be deleted eventually through compactions.

Co-authored-by: Rohan Prasad <[email protected]>
Add benchmark tool for picktable benchmarking.
dgraph-io#1700)

This PR adds FullCopy option in Stream. This allows sending the table entirely to the writer. If this option is set to true we directly copy over the tables from the last 2 levels. This option increases the stream speed while also lowering the memory consumption on the DB that is streaming the KVs.
For 71GB, compressed and encrypted DB we observed 3x improvement in speed. The DB contained ~65GB in the last 2 levels while remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode has to be the same in sender and receiver. This will restrict db.StreamDB() to use the same encryption mode in both input and output DB. Added TODO for allowing different encryption modes.
Remove "GitHub issues" reference. (we use discuss now)
)

Remove Datadog's ZSTD that requires CGO
Make Klauspost's ZSTD as default
@CLAassistant
Copy link

CLAassistant commented Jul 2, 2021

CLA assistant check
All committers have signed the CLA.

NamanJain8 and others added 5 commits July 6, 2021 21:42
This PR adds support for stream writing incrementally to the DB.
Adds an API: StreamWriter.PrepareIncremental

Co-authored-by: Manish R Jain <[email protected]>
…h-io#1723)

While doing an incremental stream write, we should look at the first level on which there is no data. Earlier, due to a bug we were writing to a level that already has some tables.
I propose this simple fix for detecting conflicts in managed mode. Addresses https://discuss.dgraph.io/t/fatal-error-when-writing-conflicting-keys-in-managed-mode/14784.

When a write conflict exists for a managed DB, an internal assert can fail.
This occurs because a detected conflict is indicated with commitTs of 0, but handling the error is skipped for managed DB instances.

Rather than conflate conflict detection with a timestamp of 0, it can be indicated with another return value from hasConflict.
…raph-io#1721)

With the introduction of SinceTs, a bug was introduced dgraph-io#1653 that skips the pending entries.
The default value of SinceTs is zero. And for the transaction made at readTs 0, the pending entries have version set to 0. So they were also getting skipped.
This PR adds CD steps for Badger releases. Artifacts (badger binary and
checksum) will be uploaded automatically to Github. Final step will be
to add artifacts to release. This reflects the process we already have
in place for Dgraph.

Badger build flags were taken from the [Dgraph release
script](https://github.com/dgraph-io/dgraph/blob/main/contrib/release.sh).
We add a Makefile to streamline the build process.

(cherry picked from commit 11c81e3)

## Remark

PR is duplicate (cherry-pick) because we have two branches running in
parallel (main and release/v3.2103).
@joshua-goldstein joshua-goldstein added area/performance Performance related issues. and removed skip/stale Skip stalebot labels Nov 4, 2022
joshua-goldstein and others added 4 commits December 8, 2022 10:21
Latest runner tag now uses ubuntu-22.04.  We pin to ubuntu 20.04.
Currently [appveyor
tests](https://ci.appveyor.com/project/manishrjain/badger/builds/42502297)
are failing in multiple places on Windows.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg"
height="34" align="absmiddle"
alt="Reviewable"/>](https://reviewable.io/reviews/dgraph-io/badger/1775)
<!-- Reviewable:end -->
## Problem

Currently we only deploy amd64 badger CLI tool builds. We would like
arm64 builds too.

## Solution

Use an arm64 self-hosted runner to build arm64 badger CLI tool.
mYmNeo added a commit to mYmNeo/badger that referenced this pull request Jan 18, 2023
@joshua-goldstein joshua-goldstein force-pushed the feature/improve_iterator_prefix_seek branch from 2ee9c97 to 2ec98c3 Compare February 6, 2023 22:12
@joshua-goldstein joshua-goldstein changed the base branch from master to main February 6, 2023 22:12
@joshua-goldstein joshua-goldstein changed the base branch from main to master February 6, 2023 22:13
mYmNeo added a commit to mYmNeo/badger that referenced this pull request Feb 13, 2023
Copy link

This PR has been stale for 60 days and will be closed automatically in 7 days. Comment to keep it open.

@github-actions github-actions bot added the Stale label Jul 19, 2024
@github-actions github-actions bot closed this Jul 27, 2024
@harshil-goel harshil-goel reopened this Jul 27, 2024
@github-actions github-actions bot removed the Stale label Jul 27, 2024
@harshil-goel harshil-goel changed the base branch from master to main August 13, 2024 14:23
@harshil-goel harshil-goel requested a review from a team as a code owner August 13, 2024 14:23
@harshil-goel
Copy link
Contributor

Hey, I am trying to test this diff. Would it be possible for you to rebase it again? Otherwise I will have to create a new diff.

harshil-goel added a commit that referenced this pull request Aug 14, 2024
Copy of #1719

Co-authored-by: Ziyuan Zhong <[email protected]>
@harshil-goel
Copy link
Contributor

This has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Performance related issues.
Development

Successfully merging this pull request may close these issues.