Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt(stream): add option to directly copy over tables from lower levels #1700

Merged
merged 11 commits into from
May 21, 2021

Conversation

manishrjain
Copy link
Contributor

@manishrjain manishrjain commented May 10, 2021

This PR adds FullCopy option in Stream. This allows sending the table entirely to the writer. If this option is set to true we directly copy over the tables from the last 2 levels. This option increases the stream speed while also lowering the memory consumption on the DB that is streaming the KVs.
For 71GB, compressed and encrypted DB we observed 3x improvement in speed. The DB contained ~65GB in the last 2 levels while remaining in the above levels.

Time taken CPU usage (% user, system, io, idle)
master 22m15s 37.2, 5.4, 5.9, 53.4
PR 8m20s 1.2, 2.6, 7.8, 88.8
master-wo-write 7m30s 51.45, 4.1, 0.1, 44.34
PR-wo-write 2m50s 0.5, 3, 4, 92

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode has to be the same in sender and receiver. This will restrict db.StreamDB() to use same encryption mode in both input and output DB. Added TODO for allowing different encryption modes.

This change is Reviewable

Copy link
Contributor Author

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments.

Reviewed 3 of 8 files at r1, 1 of 6 files at r2, 8 of 8 files at r3.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @manishrjain)


key_registry.go, line 390 at r3 (raw file):

	if kr.lastCreated < dk.CreatedAt {
		kr.lastCreated = dk.CreatedAt

No need to update this. We didn't create this.


key_registry.go, line 393 at r3 (raw file):

	}

	kr.dataKeys[dk.KeyId] = &dk

Just check if you already have this keyID or not. If not, then set it. If you do, then just change the key ID?

for that matter, you can just always use the nextKeyId. So, basically never use the KeyId that was passed. It's just safer.


levels.go, line 1772 at r3 (raw file):

	var change pb.ManifestChange
	if err := proto.Unmarshal(kv.Key, &change); err != nil {
		return err

errors.Wrapf


levels.go, line 1777 at r3 (raw file):

	dk, err := lc.kv.registry.DataKey(change.KeyId)
	if err != nil {
		return err

errors.Wrapf


levels.go, line 1788 at r3 (raw file):

	// Create a copy of the kv.Value because it is owned by the z.buffer.
	tbl, err := table.CreateTableFromBuffer(fname, y.Copy(kv.Value), opts)

error?


levels.go, line 1792 at r3 (raw file):

	// TODO(ibrahim): Check all the increment refs are done correctly.
	// Tables are sent in the sorted order, so no need to sort them here.
	lc.levels[lev].addTable(tbl)

Can you add a TODO that encryption / decryption might be required for the table. In fact, it shouldn't be too hard -- just do inplace encrypt / decrypt.

Just ensure you're not doing that from within a level controller lock.


stream.go, line 90 at r3 (raw file):

	// CopyTables should be set to true only when the and encryption is enabled/disabled in both
	// the sender and receiver.
	CopyTables   bool

FullCopy bool

If FullCopy is set, then SinceTs == 0, and ChooseKey == nil, and KeyToList == nil. If any of them are set, do a panic.


stream.go, line 377 at r3 (raw file):

		level := i
		tables := tableMatrix[i]
		for _, t := range tables {

Add a TODO here to see if making this concurrent would be helpful. Most likely it won't. But, if it does work, then most like <3 goroutines might be sufficient.


stream.go, line 459 at r3 (raw file):

	// Picks up ranges from Badger, and sends them to rangeCh.
	go st.produceRanges(ctx)

Just put a comment here that just for simplicity, we'd still consider all the tables for range production.


value.go, line 791 at r3 (raw file):

// write is thread-unsafe by design and should not be called concurrently.
func (vlog *valueLog) write(reqs []*request) error {
	if vlog.db.opt.InMemory || vlog.db.opt.managedTxns {

If Badger starts in managed mode, can we not create a new value log?

@NamanJain8 NamanJain8 marked this pull request as ready for review May 20, 2021 15:47
@NamanJain8 NamanJain8 changed the title Mrjn/optimize stream opt(stream): add option to directly copy over tables from lower levels May 20, 2021
Copy link
Contributor Author

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Some comments. Nice work!

Reviewed 1 of 8 files at r1, 7 of 7 files at r4.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @jarifibrahim)


level_handler.go, line 355 at r4 (raw file):

		}
		return out
	}

Vertical space after this. To represent that we returned above.


levels.go, line 1795 at r4 (raw file):

	lc.levels[lev].addTable(tbl)
	// Release the ref held by OpenTable.

Mention in comment: addTable would add a reference.


stream_writer.go, line 54 at r4 (raw file):

	senderPrevLevel int
	keyId           map[uint64]*pb.DataKey // stores reader's keyId to data key map.
	processingKeys  bool                   // true if we have started processing keys.

Add a comment about the writer might receive tables first, and then receive keys. So, this would tell us which stage we are in.


value.go, line 791 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

If Badger starts in managed mode, can we not create a new value log?

Add a TODO.

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @manishrjain)

@NamanJain8 NamanJain8 merged commit 74ade98 into master May 21, 2021
@NamanJain8 NamanJain8 deleted the mrjn/optimize-stream branch May 21, 2021 06:20
NamanJain8 added a commit to dgraph-io/dgraph that referenced this pull request May 28, 2021
)

When streaming the entire data in the snapshot (snap.SinceTs=0), we can do an entire table copy instead of iterating over the KVs. This brings about 3x performance improvement, as well as the CPU of the sender, is idle. Refer dgraph-io/badger#1700 for more details.
mangalaman93 pushed a commit that referenced this pull request Feb 14, 2023
#1700)

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 pushed a commit that referenced this pull request Feb 14, 2023
#1700)

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 pushed a commit that referenced this pull request Feb 14, 2023
#1700)

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request Feb 14, 2023
#1700)

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request Feb 15, 2023
#1700)

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request Feb 18, 2023
#1700)

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request Feb 19, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request Feb 20, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request Feb 22, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request Feb 23, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request Feb 24, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request Mar 1, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request Mar 2, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request Mar 6, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request Mar 7, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 pushed a commit to dgraph-io/dgraph that referenced this pull request Mar 7, 2023
)

When streaming the entire data in the snapshot (snap.SinceTs=0), we can do
an entire table copy instead of iterating over the KVs. This brings about
3x performance improvement, as well as the CPU of the sender, is idle.
Refer dgraph-io/badger#1700 for more details.
mangalaman93 added a commit to dgraph-io/dgraph that referenced this pull request Mar 7, 2023
)

When streaming the entire data in the snapshot (snap.SinceTs=0), we can do
an entire table copy instead of iterating over the KVs. This brings about
3x performance improvement, as well as the CPU of the sender, is idle.
Refer dgraph-io/badger#1700 for more details.
mangalaman93 added a commit to dgraph-io/dgraph that referenced this pull request Mar 7, 2023
)

When streaming the entire data in the snapshot (snap.SinceTs=0), we can do
an entire table copy instead of iterating over the KVs. This brings about
3x performance improvement, as well as the CPU of the sender, is idle.
Refer dgraph-io/badger#1700 for more details.
mangalaman93 added a commit that referenced this pull request Mar 8, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request Mar 8, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request Mar 9, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit to dgraph-io/dgraph that referenced this pull request Mar 15, 2023
)

When streaming the entire data in the snapshot (snap.SinceTs=0), we can do
an entire table copy instead of iterating over the KVs. This brings about
3x performance improvement, as well as the CPU of the sender, is idle.
Refer dgraph-io/badger#1700 for more details.
mangalaman93 added a commit that referenced this pull request Mar 15, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit that referenced this pull request May 17, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit to dgraph-io/dgraph that referenced this pull request May 17, 2023
)

When streaming the entire data in the snapshot (snap.SinceTs=0), we can do
an entire table copy instead of iterating over the KVs. This brings about
3x performance improvement, as well as the CPU of the sender, is idle.
Refer dgraph-io/badger#1700 for more details.
mangalaman93 added a commit that referenced this pull request Jun 7, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit to dgraph-io/dgraph that referenced this pull request Jun 7, 2023
)

When streaming the entire data in the snapshot (snap.SinceTs=0), we can do
an entire table copy instead of iterating over the KVs. This brings about
3x performance improvement, as well as the CPU of the sender, is idle.
Refer dgraph-io/badger#1700 for more details.
mangalaman93 added a commit that referenced this pull request Jun 12, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit to dgraph-io/dgraph that referenced this pull request Jun 12, 2023
)

When streaming the entire data in the snapshot (snap.SinceTs=0), we can do
an entire table copy instead of iterating over the KVs. This brings about
3x performance improvement, as well as the CPU of the sender, is idle.
Refer dgraph-io/badger#1700 for more details.
mangalaman93 added a commit to dgraph-io/dgraph that referenced this pull request Jul 19, 2023
)

When streaming the entire data in the snapshot (snap.SinceTs=0), we can do
an entire table copy instead of iterating over the KVs. This brings about
3x performance improvement, as well as the CPU of the sender, is idle.
Refer dgraph-io/badger#1700 for more details.
mangalaman93 added a commit that referenced this pull request Jul 19, 2023
#1700)

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
mangalaman93 added a commit to dgraph-io/dgraph that referenced this pull request Jul 19, 2023
)

When streaming the entire data in the snapshot (snap.SinceTs=0), we can do
an entire table copy instead of iterating over the KVs. This brings about
3x performance improvement, as well as the CPU of the sender, is idle.
Refer dgraph-io/badger#1700 for more details.
fredcarle pushed a commit to fredcarle/badger that referenced this pull request Aug 1, 2023
dgraph-io#1700)

This PR adds FullCopy option in Stream. This allows sending the table entirely to the writer. If this option is set to true we directly copy over the tables from the last 2 levels. This option increases the stream speed while also lowering the memory consumption on the DB that is streaming the KVs.
For 71GB, compressed and encrypted DB we observed 3x improvement in speed. The DB contained ~65GB in the last 2 levels while remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode has to be the same in sender and receiver. This will restrict db.StreamDB() to use the same encryption mode in both input and output DB. Added TODO for allowing different encryption modes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants