-
Notifications
You must be signed in to change notification settings - Fork 3k
Data: Always use delete index for position deletes #9117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| List<CloseableIterable<Record>> deletes = Lists.transform(posDeletes, this::openPosDeletes); | ||
|
|
||
| // if there are fewer deletes than a reasonable number to keep in memory, use a set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this logic was added a few years ago, we added position deletes into a set (see the comment). We have been using bitmaps for a while now. In fact, vectorized reads always build bitmaps and have no threshold on the number of deletes. This has proven to work really well. Position deletes represented as bitmaps should always fit in memory.
Position deletes compress really well both on disk and in memory. We have seen this 100K threshold causing degradation in jobs without any good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, this makes sense and appreciate the simplification. I remember seeing this threshold in the code and was wondering how we got to the 100000 value.
|
|
||
| List<CloseableIterable<Record>> deletes = Lists.transform(posDeletes, this::openPosDeletes); | ||
|
|
||
| // if there are fewer deletes than a reasonable number to keep in memory, use a set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, this makes sense and appreciate the simplification. I remember seeing this threshold in the code and was wondering how we got to the 100000 value.
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @aokolnychyi for improving this
singhpk234
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks @aokolnychyi !
|
Thank you for reviewing, @amogh-jahagirdar @nastra @singhpk234! |
This PR removes the outdated logic in
DeleteFilterthat determined whether we have to stream through position deletes or load them in memory. Instead, we should always rely on the bitmap-based approach like in vectorized reads.