-
Notifications
You must be signed in to change notification settings - Fork 3k
Use particular worker pool for flink jobs #4177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
93762c7 to
82ea9c1
Compare
flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
Show resolved
Hide resolved
| } | ||
|
|
||
| @Override | ||
| public void open(Configuration parameters) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the pool size be configured by parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is there a way to share pools if there are multiple Iceberg operators in the same Flink job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the pool size be configured by parameters?
Configured from scan context;
Also, is there a way to share pools if there are multiple Iceberg operators in the same Flink job?
I think it's hard to share, and it will easily get meaningless across distributed nodes.
What do you think, @rdblue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue do you think that sharing a pool in one job is a blocking issue? If that, we can provide a pool keyed by job. it's somehow reasonable to replace the original pool equivalently :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, sorry about this. I think my comment here is probably what caused the confusion about sharing pools by job ID. I think there are use cases around this (Steven has one at least) but let's focus on fixing the problem here and sharing resources later.
Thanks for your patience, @yittg!
8279039 to
6b7666a
Compare
flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
Outdated
Show resolved
Hide resolved
1289e7c to
6af99ed
Compare
45f3de2 to
8f3733a
Compare
| return WORKER_POOL; | ||
| } | ||
|
|
||
| public static ExecutorService newWorkerPool(String namePrefix, Integer parallelism) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is poolSize more intuitive than parallelism?
| public static ExecutorService newWorkerPool(String namePrefix, Integer parallelism) { | ||
| return MoreExecutors.getExitingExecutorService( | ||
| (ThreadPoolExecutor) Executors.newFixedThreadPool( | ||
| Optional.ofNullable(parallelism).orElse(WORKER_THREAD_POOL_SIZE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make the param a primitive type and provide an overload method without the poolSize param?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay either way.
| super.open(parameters); | ||
|
|
||
| final String jobId = getRuntimeContext().getJobId().toString(); | ||
| this.workerPool = ThreadPools.newKeyedWorkerPool(jobId, "flink-worker-pool", scanContext.planParallelism()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw this shares the same key as IcebergFilesCommitter, but not FlinkInputFormat. Trying to understand the reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree here. Since this is creating a different thread pool per job ID, the thread name prefix should also include the job ID to get unique names.
flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/source/SplitHelpers.java
Show resolved
Hide resolved
| final String jobId = getRuntimeContext().getJobId().toString(); | ||
| this.workerPool = ThreadPools.newKeyedWorkerPool(jobId, "flink-worker-pool", scanContext.planParallelism()); | ||
| getRuntimeContext().registerUserCodeClassLoaderReleaseHookIfAbsent( | ||
| "release-flink-worker-pool", () -> ThreadPools.shutdownKeyedWorkerPool(jobId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the key here also going to be a problem? Or is this a description?
| .build())); | ||
| } | ||
|
|
||
| public static ExecutorService newKeyedWorkerPool(String key, String namePrefix, Integer parallelism) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we need to keep worker pools here in a static map.
The two places where this is called immediately set up a callback that calls shutdown, but could easily keep a reference to the worker pool locally instead of storing it here by name.
I think it would be better to avoid keeping track of pools here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the intention is to reuse the job specific thread pool in Flink, then we do need the static cache as the same keyed pool may be requested from multiple code path.
Is this a Flink only problem regarding classloader issue on thread pool? if so, maybe we can move the keyed cache into Flink module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so the job can share between the monitor and the sink? I don't really mind having two pools for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue ,sorry, i don't get your point exactly. let me guess, what you really mean is sharing pools for all sources or all sinks, not for all sources and sinks?
To be clear, for example, if a job consists of:
Source: Iceberg A(parallelism: 3), Source:Iceberg B, Sink:Iceberg C, Sink: Iceberg D.
What's your favor?
- share btw all parallelism of one operator, like 3 subtask for Iceberg A (it can be run in different slots in one TaskManager or different TaskManagers) ;
- share btw all sources or all sinks, like sharing one for btw A and B, and another one for C and D;
- share btw all operators, like sharing btw A, B, C, and D; all subtasks in same TaskManager can share.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was thinking was a pool per operator in a job, rather than a pool per job. That avoids the need to track thread pools by some key in static state. I think it is probably fine to have more pools since these are primarily for IO. Does that sound reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after reviewing the usage of the thread pools, I am also in favor of no sharing of thread pools so that we can avoid the static cache. None of the usage is on parallel tasks.
- source: split planning (running on jobmanager or the single-parallelism StreamingMonitorFunction)
- sink: single-parallelism committer
But we do need to add some user doc to clarify the behavior change regarding I/O thread pool. Previously, there is a global shared thread pool per JVM. Now it is per source/sink. E.g., Internally we had a rather unique setup where a single Flink job (running on many taskmanagers) can ingest data to dozens or hundreds of Iceberg tables. For those setups, users would need to tune down the pool size to probably 1 to avoid excessive number of threads created in JVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevenzwu, for that use case, maybe we should follow up to this PR with one that allows you to configure a named threadpool? I think that's probably the use case that @yittg had in mind when he set up sharing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to documentation change, we should also make sure this behavior change is captured in the release note of the next minor version release of 0.14.0. @rdblue where do we track future release note?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevenzwu, I added the "release notes" tag to this PR and added it to the 0.14.0 release milestone so we add this to release notes. If you want, you can add a comment with the suggested release notes at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, although I didn't think about sharing at the beginning. but some kind of sharing or global limit sounds good to me after some consideration. We can provide a reasonable solution next I think.
Thanks, @rdblue and @stevenzwu .
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @yittg! I really appreciate how patient you've been with me getting back to this review.
There are two main things to fix now. First, I don't think we need to keep track of open pools in ThreadPools. Second, I agree with @stevenzwu's comment about passing the same name prefix for all of the pools created by the monitor and the sink. We should make sure the prefix is also unique by job ID.
Thanks!
8f3733a to
62b04d9
Compare
| final ExecutorService workerPool = ThreadPools.newWorkerPool("iceberg-plan-worker-pool", context.planParallelism()); | ||
| try (TableLoader loader = tableLoader) { | ||
| Table table = loader.loadTable(); | ||
| return FlinkSplitPlanner.planInputSplits(table, context); | ||
| return FlinkSplitPlanner.planInputSplits(table, context, workerPool); | ||
| } finally { | ||
| workerPool.shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is called in client and job manager. So there is no context here. Given it's a adhoc pool and will be shut down after planning, i think it's ok to name it in this way.
yittg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the StreamingMonitorFunction and IcebergFilesCommitter are both 1-parallelism. We can new worker pool in subtask open along with guaranteeing one worker pool per operator.
62b04d9 to
efcd332
Compare
Fixes #3776