-
Notifications
You must be signed in to change notification settings - Fork 3k
Make metrics reporting asynchronous #13507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Right now the implementation of RESTMetricsReporter is synchronous and the REST call can take a significant amount of time. Change the reporting to be async instead.
| private final RESTClient client; | ||
| private final String metricsEndpoint; | ||
| private final Supplier<Map<String, String>> headers; | ||
| private final ExecutorService executor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to take a look at the ThreadPools utility class/usages. We probably want an exiting pool and I'm not sure if we should just use the worker pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| return; | ||
| } | ||
|
|
||
| executor.submit(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest that we use Tasks utility class here. I believe the rest client will handle a number of retry scenarios, so I don't know how much we want to do retries at this layer. @nastra might have thoughts on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut feeling is that we shouldn't be retrying here. We don't make any guarantees about metrics reports and we don't want to cause/surface any issues in case metrics can't be sent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Not adding any retry.
| LOG.warn("Failed to report metrics to REST endpoint {}", metricsEndpoint, e); | ||
| } | ||
| Tasks.range(1) | ||
| .executeWith(ThreadPools.getWorkerPool()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be using the worker pool here since that's used for planning operations. I wouldn't want to possibly interfere with that with metrics reporting which is basically best effort. Should we just define an exiting thread pool metrics-publisher-pool or something like that and then just use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amogh-jahagirdar Done.
| /** | ||
| * Sets the size of the metrics reporting thread pool. This limits the number of concurrent | ||
| * metrics reporting operations. | ||
| */ | ||
| public static final ConfigEntry<Integer> METRICS_THREAD_POOL_SIZE = | ||
| new ConfigEntry<>( | ||
| "iceberg.metrics.num-threads", | ||
| "ICEBERG_METRICS_NUM_THREADS", | ||
| 2, | ||
| Integer::parseUnsignedInt); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I should've been more specific on my last review, while I think the pool should be separate I don't think we need the metrics thread pool to be configurable at least at this point (I also think we could probably get away with a threadpool size of 1 for this but that's a minor point). I don't think it really generalizes beyond the REST case at least at the moment, so until it does we should probably keep it isolated is my opinion.
TLDR: I think we should just have a static field Threadpools.newExitingWorkerPool("rest-metrics-reporter", someFixedNumberOfThreads) inside RESTMetricReporter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Done.
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @anoopj , things look good to me! I'll give it a day before merging in case the others want to take another pass
| LOG.warn("Failed to report metrics to REST endpoint {}", metricsEndpoint, e); | ||
| } | ||
| Tasks.range(1) | ||
| .executeWith(METRICS_EXECUTOR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also set .suppressFailureWhenFinished(), otherwise the exception is going to be propagated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra great callout! Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @nastra
Right now the implementation of RESTMetricsReporter is synchronous and the REST call can take a significant amount of time. Change the reporting to be async instead.