Skip to content

Conversation

@anoopj
Copy link
Contributor

@anoopj anoopj commented Jul 9, 2025

Right now the implementation of RESTMetricsReporter is synchronous and the REST call can take a significant amount of time. Change the reporting to be async instead.

Right now the implementation of RESTMetricsReporter is synchronous and
the REST call can take a significant amount of time. Change the
reporting to be async instead.
@github-actions github-actions bot added the core label Jul 9, 2025
private final RESTClient client;
private final String metricsEndpoint;
private final Supplier<Map<String, String>> headers;
private final ExecutorService executor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to take a look at the ThreadPools utility class/usages. We probably want an exiting pool and I'm not sure if we should just use the worker pool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return;
}

executor.submit(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that we use Tasks utility class here. I believe the rest client will handle a number of retry scenarios, so I don't know how much we want to do retries at this layer. @nastra might have thoughts on it.

Copy link
Contributor

@nastra nastra Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling is that we shouldn't be retrying here. We don't make any guarantees about metrics reports and we don't want to cause/surface any issues in case metrics can't be sent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Not adding any retry.

LOG.warn("Failed to report metrics to REST endpoint {}", metricsEndpoint, e);
}
Tasks.range(1)
.executeWith(ThreadPools.getWorkerPool())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be using the worker pool here since that's used for planning operations. I wouldn't want to possibly interfere with that with metrics reporting which is basically best effort. Should we just define an exiting thread pool metrics-publisher-pool or something like that and then just use that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 56 to 66
/**
* Sets the size of the metrics reporting thread pool. This limits the number of concurrent
* metrics reporting operations.
*/
public static final ConfigEntry<Integer> METRICS_THREAD_POOL_SIZE =
new ConfigEntry<>(
"iceberg.metrics.num-threads",
"ICEBERG_METRICS_NUM_THREADS",
2,
Integer::parseUnsignedInt);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should've been more specific on my last review, while I think the pool should be separate I don't think we need the metrics thread pool to be configurable at least at this point (I also think we could probably get away with a threadpool size of 1 for this but that's a minor point). I don't think it really generalizes beyond the REST case at least at the moment, so until it does we should probably keep it isolated is my opinion.

TLDR: I think we should just have a static field Threadpools.newExitingWorkerPool("rest-metrics-reporter", someFixedNumberOfThreads) inside RESTMetricReporter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Done.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anoopj , things look good to me! I'll give it a day before merging in case the others want to take another pass

LOG.warn("Failed to report metrics to REST endpoint {}", metricsEndpoint, e);
}
Tasks.range(1)
.executeWith(METRICS_EXECUTOR)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also set .suppressFailureWhenFinished(), otherwise the exception is going to be propagated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra great callout! Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @nastra

@nastra nastra merged commit 20b2179 into apache:main Jul 11, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants