-
Notifications
You must be signed in to change notification settings - Fork 3k
Flink: add an option to set monitoring snapshot number #4943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flink: add an option to set monitoring snapshot number #4943
Conversation
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution @chenjunjiedada! I know that especially the case of streaming a table from the start is difficult and can cause high memory pressure.
I would also love to get @stevenzwu's input here.
| private static final ConfigOption<Integer> MONITOR_SNAPSHOT_NUMBER = | ||
| ConfigOptions.key("monitor-snapshot-number").intType().defaultValue(Integer.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Is there perhaps a more descriptive name for this? This is the number of snapshots to consider within each monitor interval loop, correct?
Kafka has more or less the same concept in its max.poll.interval and other consumer related configurations properties around polling.
Maybe we can take some inspiration from that naming. Thinking off the top of my head, but maybe monitor-max-snapshots-per-interval or something like that would be more instructive to the user? Given we already have monitor-interval as a configuration property as well.
cc @stevenzwu for your thoughts as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to use "max" word. How about max-snapshots-per-monitor-interval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I’m good with that. That’s much more clear to me what that is, especially as we have a monitor-interval in the configs already.
So users will go to look up what that is - which we should ensure is documented as a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you could please rename the code instances from monitorNumber that would be great. That name is much more descriptive.
| List<List<Record>> recordsList = generateRecordsAndCommitTxn(10); | ||
|
|
||
| for (int monitorNumber = 1; monitorNumber < 11; monitorNumber = monitorNumber + 1) { | ||
| ScanContext scanContext = ScanContext.builder() | ||
| .monitorInterval(Duration.ofMillis(100)) | ||
| .monitorSnapshotNumber(monitorNumber) | ||
| .build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any assertions we can apply (that wouldn't be too flakey) for this whole outer loop? Seems like we should have 10 splits total, correct?
Also nit on starting the for loop at 0 vs 1 if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to tune the input split size and add more assertions.
Since using 0 as the max monitor number make no sense, so I planned to add a condition check for ScanContext ctor to discard invalid configuration. How about adding that check and an assertion to throw a check exception? Does that make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes definitely. A Precondition on an invalid configuration is much preferred. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s always better in my opinion to throw on an invalid configuration vs try to adjust for the user’s behavior (unless we introduced the bug in which case we should take that case by case).
We should check that it’s non-negative in a Precondition check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also be testing that we only have at most 10 * max-snapshots-per-monitor-interval amount of snapshots processed at the end of the large loop.
That should be true, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of testing at the outside loop, I compare the exact split number to the planning result inside the loop. Does that make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that makes sense. Thank you @chenjunjiedada
| } else { | ||
| List<Long> snapshotIds = SnapshotUtil.snapshotIdsBetween(table, lastSnapshotId, snapshot.snapshotId()); | ||
| if (snapshotIds.size() < scanContext.monitorSnapshotNumber()) { | ||
| snapshotId = snapshot.snapshotId(); | ||
| } else { | ||
| snapshotId = snapshotIds.get(snapshotIds.size() - scanContext.monitorSnapshotNumber()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this logic here / walk me through an example case where snapshotId needs to be determined because it's equal to (or possibly greater than?) the monitorSnapshotNumber?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SnapshotIdBetween function returns a list of snapshot IDs [lastSnapshotId(exclusive), currentSnapshotId(inclusive)] that are ordered by commit time, descending. So the latest snapshot is the first item on the list.
Consider following two cases:
monitorSnapshotNumber>snapshotIds.size(),snapshotIdshould be the Id of latest snapshot.monitorSnapshotNumber<snapshotIds.size(),snapshotIdis computed according to reverted index because of the descending order in the list.
When monitorSnapshotNumber is equal to snapshotIds.size(), snapshotId values are same in if and else blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, I suggest using lastSnapshotId and monitorSnapshotNumber directly to get to the destination.
Specifically, there should be two ways:
- use
lastSnapshotIdandmonitorSnapshotNumberto getsnapshotId, instead of gettingcurrentSnapshotIdand comparing, - Use
lastSnapshotIdandmonitorSnapshotNumberdirectly to get the snapshot list
The above may require modifications to the core module, but I think it will make the logic more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m thinking through these situations now.
But users would never enter this block if they don’t opt into the new behavior, is that correct?
If so, can we add a conditional case that this block won’t be entered unless the user have a non INT_MAX value (eg we make a Precondition and just make a sentence that starts with [bug] this shouldn’t happen <because>). We have one or two other places that use the same [bug] syntax and this new logic change would ideally not apply to users that keep the default behavior unless they encounter a bug.
Just to be extra cautious. Or for my own understanding while I review these scenarios 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the idea of using the snapshot ID though generally speaking (in this case I need to review).
However, It’s possible to turn off snapshot ID inheritance. So we’d need to consider that.
EDIT - We have assertions on snapshot ID inheritance within this class already so that’s fair to consider imo 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can we make this its own method? It should be skipped entirely if the user doesn’t have a configured value (eg they have INT_MAX). They don’t need to any of this processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we already consider the inheritance in state initialization, do we need another check?
@hililiwei I agree computing using lastComsumedSnapshotId and maxSnapshotsPerMonitorInterval is more accurate. But I haven't found direct methods or utils to compute that, maybe need more utils in SnapshotUtil as you said. Can we have such cases in which the last consumed snapshot is not an ancestor of the latest snapshot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if that would be the case after supporting Branch and Tag feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether it can route to different branches without changing table states or properties. Like git, if we don't explicitly checkout a branch we should be able to traverse the history, right?
|
The numbers in the photos do look really good, but I'd be interested in seeing the same screenshot with monitor snapshot number applied vs not applied at around the same time into the job. |
| } | ||
|
|
||
| Builder monitorSnapshotNumber(int newMonitorSnapshotNumber) { | ||
| this.monitorSnapshotNumber = newMonitorSnapshotNumber; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a check here? It should be greater than 0, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. A precondition check. And INT_MAX can be the value that disables this behavior.
Or a negative value (eg -1). A negative value is probably more in-line with what we typically do and more cross-language friendly (as Iceberg table format is a specification first and foremost… it should be able to be rewritten in a language that doesn’t have JVM INT_MAX).
| public void run(SourceContext<FlinkInputSplit> ctx) throws Exception { | ||
| this.sourceContext = ctx; | ||
| while (isRunning) { | ||
| LOG.info("Start polling snapshots from snapshot id: {}, monitor snapshot number {}", lastSnapshotId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use debug to prevent too many logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this could get very excessive very quickly.
This should be a debug log as users can opt in via their logging configuration if need be.
|
It might be a good idea (in a follow up) to make a metric that monitors how many snapshots are processed per monitor interval. Also, since this grabs the checkpoint lock, will it possibly not be an even multiple when the checkpoint happens? If so, we should add a metric that tracks that as well (again, in a follow up). Does that interest you @hililiwei? Or @chenjunjiedada? If so, feel free to make the ticket and we’ll deal with that as soon as we can 🙂 |
| return latestSnapshotId; | ||
| } else { | ||
| // This doesn't consider snapshot inheritance since it is already checked in state initialization. | ||
| return snapshotIds.get(snapshotIds.size() - maxSnapshotsPerMonitorInterval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic seems incorrect to me. SnapshotUtil.snapshotIdsBetween returns the snapshot ids in the reverse order (most recent snapshot first). I think we need to use the reversed list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it now. it is actually correct. We should improve the comment. Because it is a revert list (list size - maxSnapshotsPerMonitorInterval) will actually point to the snapshot that results in (fromSnapshotId, toSnapshotId) with size of maxSnapshotsPerMonitorInterval
| monitorAndForwardSplits(); | ||
| } | ||
| } | ||
| LOG.debug("Forwarded splits from {}(exclusive) to {}(inclusive), time elapsed {}ms", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to log the durations for the two steps separately if we want to have better understanding of the bottleneck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging seems wrong to me. both startSnapshotId and lastSnapshotId are pointing to the same snapshotId. Moving the logging inside maxReachableSnapshotId, we can correctly apply the endSnapshotId calculated from maxReachableSnapshotId.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastSnapshotId is updated in monitorAndForwardSplits. Does that make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I missed lastSnapshotId was updated by monitorAndForwardSplits. It also means that the code is difficult to read. we are relying on the side effect of the monitorAndForwardSplits method. It is more clear to avoid such side effect. Plus it is better to measure the latency separately: planning vs emitting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenjunjiedada I prefer we address the above comment regarding logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
| */ | ||
| public class FlinkConfigOptions { | ||
|
|
||
| public static final int MAX_SNAPSHOTS_PER_MONITOR_INTERVAL_DEFAULT = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can default it to Integer.MAX_VALUE. then there is no need to check default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevenzwu , Kyle suggested using -1 here: #4943 (comment). I think that makes sense to me as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the ScanSummary also use Integer.MAX_VALUE. -1 default seems to be used mostly for invalid ids (like snapshotId, fieldId etc.)
private int limit = Integer.MAX_VALUE;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbendick What do you think for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m ok with INT_MAX if that’s what’s used elsewhere in ScanSummary. I suggested -1 as we use -1 to disable cache expiration in the caching catalog and because it’s easier to use in things like Python where INT_MAX is less commonly used. -1 is also easier to use as in-line SQL option imo.
But if INT_MAX is already used in the ScanSummary, I suggest we keep that consistent.
We can then later on consider using -1 in both places within Flink (or more places). But consistency is better imo.
|
In general, I think this is a right direction. It is not mutually exclusive with the PR #4911 (reduce the checkpoint lock scope). We should have both. This is focused on making the plan smaller and faster. PR #4911 can avoid holding the lock beyond what is actually necessary. For the new FLIP-27 source, I have been thinking about something very similar. There is no point of eagerly discover all splits/snapshots if the Flink job is falling behind too much. We need to throttle the split discovery. In addition to limit the number of snapshots per discovery cycle, I am also thinking about that we should pause/skip the split discovery, if the number of pending splits is over a certain threshold. It is like a backpressure mechanism. This can help control the memory footprint. This won't be in the MVP version of FLIP-27 source. We can follow up on the optimization after MVP version is merged. |
| .maxSnapshotsPerMonitorInterval(maxSnapshotsNum) | ||
| .build(); | ||
|
|
||
| FlinkInputSplit[] expectedSplits = FlinkSplitPlanner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should directly define the number of expected splits and avoid using FlinkSplitPlanner. if FlinkSplitPlanner is not honoring maxSnapshotsPerMonitorInterval, the assertion later will still be correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FlinkSplitPlanner here is using the same scanContext as StreamingMonitorFunction, will it still produce different splits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If FlinkSplitPlanner didn't honor the maxSnapshotsPerMonitorInterval option correctly, this unit test won't be able to detect it. Both the expected and actual splits use the same planner and will generate the same planning result
| ConfigOptions.key("include-column-stats").booleanType().defaultValue(false); | ||
|
|
||
| private static final ConfigOption<Integer> MAX_SNAPSHOTS_PER_MONITOR_INTERVAL = | ||
| ConfigOptions.key("max-snapshot-per-monitor-interval").intType() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be more clear if the config is defined as max-snapshot-count-per-monitor-interval or max-snapshot-count-per-incremental-scan
Sounds interesting. if you don't mind, I'll try to follow up. |
|
@kbendick @stevenzwu @hililiwei I rebased and addressed comments, PTAL. |
| public void testConsumeWithMaxSnapshotCountPerMonitorInterval() throws Exception { | ||
| List<List<Record>> recordsList = generateRecordsAndCommitTxn(10); | ||
|
|
||
| final ScanContext scanContext1 = ScanContext.builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is better to move the invalid config to a separate test method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| TestSourceContext sourceContext = new TestSourceContext(latch); | ||
| runSourceFunctionInTask(sourceContext, function); | ||
| // Ensure the first loop in monitoring finished | ||
| Thread.sleep(100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on sleep can lead to flaky test. We can probably wait and check the condition of expected splits on sourceContext.splits
| } | ||
| } | ||
|
|
||
| Assert.assertTrue("Should have expected elements.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it is unnecessary to wait for all the snapshots. Should we make monitorAndForwardSplits package default so that it can be directly in this test class? We can use AbstractStreamOperatorTestHarness#getOperator.
We just need to call monitorAndForwardSplits once and verify it limits the number of discovered snapshots/splits to maxSnapshotsNum. We can just take a few values of maxSnapshotsNum (like 3, 9, 12). no need to loop 1 to 15.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevenzwu I removed lock/wait/sleep logic and kept the loop from 1 to 15 since the total execution time of the unit test is just a few seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few seconds is long for unit test. We just need to test the 3 scenarios of the maxSnapshotsNum: less, equal, greater.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
| newScanContext = scanContext.copyWithSnapshotId(snapshotId); | ||
| LOG.debug("Start generating splits for {}", snapshotId); | ||
| } else { | ||
| if (scanContext.maxSnapshotCountPerMonitorInterval() == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to an earlier comment. if the default value is Integer.MAX_VALUE, we won't need the if-else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
| scanContext.maxSnapshotCountPerMonitorInterval()); | ||
| } | ||
| newScanContext = scanContext.copyWithAppendsBetween(lastSnapshotId, snapshotId); | ||
| LOG.debug("Start generating splits from {}(exclusive) to {}(inclusive),", lastSnapshotId, snapshotId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: why not move the debug log outside the if-else block? To me, it is ok to use the same log line for the if case too.
Also maybe is "discover" more accurate than "generate" in the log lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean it is ok to show from -1 to $snapshotId?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| // Use the oldest snapshot as starting to avoid the initial case. | ||
| long oldestSnapshotId = SnapshotUtil.oldestAncestor(table).snapshotId(); | ||
|
|
||
| ScanContext scanContext3 = ScanContext.builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit / non-blocking: As these are in different functions, do they need to be named scanContext1 ... scanContext3. They aren't in the same scope are they?
This doesn't need to be changed, but for my own understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary actually, updated.
|
|
||
| @Test | ||
| public void testInvalidMaxSnapshotCountPerMonitorInterval() { | ||
| final ScanContext scanContext1 = ScanContext.builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We normally don't use final for variables inside of methods. Is this usage of final necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary, removed.
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few non-blocking nits but overall this looks good to me. Thanks @chenjunjiedada!
| private void monitorAndForwardSplits() { | ||
| private long maxReachableSnapshotId(long lastConsumedSnapshotId, long latestSnapshotId, | ||
| int maxSnapshotCountPerMonitorInterval) { | ||
| // This doesn't consider snapshot inheritance since it is already checked in state initialization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "snapshot inheritance" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest table snapshot id might not be the ancestor of the last consumed one. Let me delete this comment to avoid confusion.
|
|
||
| private void monitorAndForwardSplits() { | ||
| private long maxReachableSnapshotId(long lastConsumedSnapshotId, long latestSnapshotId, | ||
| int maxSnapshotCountPerMonitorInterval) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be maxSnapshots
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to snapshotLimit.
| newScanContext = scanContext.copyWithAppendsBetween(lastSnapshotId, snapshotId); | ||
| } | ||
|
|
||
| LOG.debug("Start discovering splits from {}(exclusive) to {}(inclusive),", lastSnapshotId, snapshotId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing spaces between the snapshot IDs and the inclusive or exclusive clarification.
| ConfigOptions.key("include-column-stats").booleanType().defaultValue(false); | ||
|
|
||
| private static final ConfigOption<Integer> MAX_SNAPSHOT_COUNT_PER_MONITOR_INTERVAL = | ||
| ConfigOptions.key("max-snapshot-count-per-monitor-interval").intType().defaultValue(Integer.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is very clear what this is setting from the name. Not many people know Flink internals well enough to understand what the "monitor interval" is. Is there a simpler name?
What about "max-planning-group-size" or "snapshot-group-limit"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max-planning-group-size lacks information about the group items, snapshot-group-limit looks better to me. "group" is more concise than "per-monitor-interval".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @chenjunjiedada that group is vague. what about max-planning-snapshot-count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant group is better than per-monitor-interval. I'm OK with both since I think we definitely need a doc to describe what it is and how it impacts the planning and the backpressure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found snapshot-group-limit less intuitive. what does the group mean in this context. monitor-interval is an existing config that user is already familiar.
I do agree that we should make the config name clear. Just brainstorming more names. what about max-snapshot-count-per-incremental-scan or max-snapshot-count-per-planning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
12779fa to
31de0c0
Compare
8d04acb to
ee27234
Compare
| ConfigOptions.key("include-column-stats").booleanType().defaultValue(false); | ||
|
|
||
| private static final ConfigOption<Integer> SNAPSHOT_GROUP_LIMIT = | ||
| ConfigOptions.key("snapshot-group-limit").intType().defaultValue(Integer.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked with @stevenzwu about this and the best name we could come up with was max-planning-snapshot-count. What do you think, @chenjunjiedada? Could you rename this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this name. Config keys should ideally be concise as well as being as short as possible and max-planning-snapshot-count achieves that.
The documentation can potentially use the language Maximum number of snapshots to consume and plan per group in each iteration of an incremental scan or something similar (might need to work on that language too but the language from the other ideas can be used in the docs possibly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
d5601dc to
bd1d0e6
Compare
| } | ||
|
|
||
| private void monitorAndForwardSplits() { | ||
| private long maxReachableSnapshotId(long lastConsumedSnapshotId, long latestSnapshotId, int maxSnapshotCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe call the args fromSnapshotIdExclusive and toSnapshotIdInclusive to be consistent with recent incremental API naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually maybe call this method as toSnapshotIdInclusive(long lastConsumedSnapshotId, long currentSnapshotId, int maxPlanningSnapshotCount)
| private void monitorAndForwardSplits() { | ||
| private long maxReachableSnapshotId(long lastConsumedSnapshotId, long latestSnapshotId, int maxSnapshotCount) { | ||
| List<Long> snapshotIds = SnapshotUtil.snapshotIdsBetween(table, lastConsumedSnapshotId, latestSnapshotId); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we add empty line after control block (not before)
|
Thanks, @chenjunjiedada! Nice work. |
This adds an option to control how many snapshots to monitor at once when using iceberg table as a Flink source.
Currently, the monitor operator generates file splits from the last consumed snapshot to the latest snapshot, which may lead to backpressure when the consumer lag behind as follow image shows. We can reduce the checkpoint lock scope (#4911) or increase the network buffer to mitigate the situation while the problem still cannot be completely avoided since the number of the splits is unknown, especially when starting a consumer for the first time.

With the option, the user can tune the monitoring flow according to backpressure and busy metrics.
