Skip to content

Conversation

@Heltman
Copy link
Contributor

@Heltman Heltman commented Jun 15, 2023

fixes #7843

@github-actions github-actions bot added the core label Jun 15, 2023
@Heltman Heltman force-pushed the manifest-read-limit branch 2 times, most recently from 982cf30 to 45493ed Compare June 15, 2023 07:27
import org.apache.iceberg.relocated.com.google.common.collect.Iterables;

public class ParallelIterable<T> extends CloseableGroup implements CloseableIterable<T> {
public static final int MANIFEST_READER_QUEUE_SIZE =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this and expose as public?

* The size of the queue in ParallelIterable. This queue limits the memory usage of manifest
* reader.
*/
public static final ConfigEntry<Integer> MANIFESTS_READER_QUEUE_SIZE =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ParallelIterable is a common util, does this config is used target for manifest only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only found used for read manifest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter that it is only used for one purpose right now. The iterable should be kept generic by passing in configuration.

private final Future<?>[] taskFutures;
private final ConcurrentLinkedQueue<T> queue = new ConcurrentLinkedQueue<>();
private final LinkedBlockingQueue<T> queue =
new LinkedBlockingQueue<>(MANIFEST_READER_QUEUE_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config seems to affect all the instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iceberg does not seem to provide a more effective configuration management method other than environment variables. Can we turn this into a table-level configuration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Heltman, we typically avoid system or environment config. A more appropriate place for this is in the engine's config. Then it can pass that configuration down. For example, Flink passes threadpools into the scan API rather than using the common worker pool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How I pass a threadpools to iceberg from trino. Trino only use plan() to get FileScanTask

@Heltman Heltman force-pushed the manifest-read-limit branch from 45493ed to ac85817 Compare July 14, 2023 09:20
(iterable instanceof Closeable) ? (Closeable) iterable : () -> {}) {
for (T item : iterable) {
queue.add(item);
queue.put(item);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConeyLiu I found future.cancel can't exit this loop, because ConcurrentLinkedQueue not check InterruptedException. So we need check close in this place, to avoid memory leak.

Actural, I found trino kill query, but this loop will continue add to queue until finished.

We need use LinkedBlockingQueue or add check iterator is closed, or both them.Just like below

for (T item : iterable) {
      if (closed) {
         queue.clear()
         return;
     }
      queue.put(item);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Heltman can you please add a few more tests to TestParallelIterable that would exercise this exact condition you're describing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close already clears the queue and cancels tasks. I don't think that we need to modify this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancels will not worker if use ConcurrentLinkedQueue. If we don't change to LinkedBlockingQueue, we need check closed every time before we add to queue.

@Heltman Heltman force-pushed the manifest-read-limit branch from ac85817 to 9014674 Compare December 1, 2023 06:59
@findepi
Copy link
Member

findepi commented Dec 15, 2023

cc @danielcweeks

} catch (IOException e) {
throw new RuntimeIOException(e, "Failed to close iterable");
} catch (InterruptedException e) {
throw new RuntimeException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually want to throw here or just log? It seems like you'll have to deal with an unnecessary/non-actionable exception when you're really just trying to cancel the iteration. Maybe just turn this into an info log message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielcweeks If we had closed this iterator, we don't care of throw error or log. I don't use log only because this class don't have log, I just follow it. Maybe we can add log to this class.

@danielcweeks
Copy link
Contributor

One minor comment but otherwise looks good to me. @nastra thoughts?

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest concern right now is that there aren't enough tests in TestParallelIterable that would raise the confidence in the changes being proposed

import org.apache.iceberg.relocated.com.google.common.collect.Iterables;

public class ParallelIterable<T> extends CloseableGroup implements CloseableIterable<T> {
public static final int ITERATOR_QUEUE_SIZE = SystemConfigs.ITERATOR_QUEUE_SIZE.value();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason to make this public? This doesn't seem to be used anywhere outside of this class. Also you probably could just use SystemConfigs.ITERATOR_QUEUE_SIZE.value() directly in L59

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a mistake.

(iterable instanceof Closeable) ? (Closeable) iterable : () -> {}) {
for (T item : iterable) {
queue.add(item);
queue.put(item);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Heltman can you please add a few more tests to TestParallelIterable that would exercise this exact condition you're describing?

*/
public static final ConfigEntry<Integer> ITERATOR_QUEUE_SIZE =
new ConfigEntry<>(
"iceberg.iterator.queue-size",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the config naming implies (at least to me) that this is being applied to all iterators being used in Iceberg, which isn't the case

@rdblue
Copy link
Contributor

rdblue commented Dec 18, 2023

This class cannot use a blocking queue with the worker pool, so I'm -1 on this change.

The problem is that planning uses a shared threadpool. Using a blocking queue would cause tasks to stall, which would then tie up the threads in the shared pool and cause all planning to halt.

If you want to limit memory consumption here, then you need to do the following:

  1. Add a new BlockingParallelIterable
  2. Use a blocking queue in BlockingParallelIterable
  3. Use an isolated threadpool in BlockingParallelIterable that is not shared with other planning tasks

Once that new variant is in, we can look at how to use it from the scan API. Feel free to contact me to review this, since this is a part of the code where bad changes can cause a lot of trouble!

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above.

@lirui-apache
Copy link
Contributor

We have been using a blocking queue and shared thread pool for a while. We did hit some "dead lock" issue when running multi-stage queries with trino, because the iceberg split source generates splits synchronously. We fixed it by making it async, and so far everything works fine. I think using an isolated pool would be safer, but it may hurt performance.
Since we already have API to specify thread pool for a table scan, how about we let users deicide which pool to use? IMO whether to use blocking queue and whether to use shared pool can be configured orthogonally.

@Heltman
Copy link
Contributor Author

Heltman commented Dec 19, 2023

I will add a some change for fix memory leak. And think about creating BlockingParallelIterable instead of change ParallelIterable.

@findepi
Copy link
Member

findepi commented Dec 21, 2023

The problem is that planning uses a shared threadpool. Using a blocking queue would cause tasks to stall, which would then tie up the threads in the shared pool and cause all planning to halt.

good point

3. Use an isolated threadpool in BlockingParallelIterable that is not shared with other planning tasks

from Trino perspective it would be most convenient to be able to provide an Executor.
(Trino uses shared thread pool. + a "BoundedExecutor" which limits number of threads available to a given task, while still being able to reuse threads.)

@Heltman
Copy link
Contributor Author

Heltman commented Jan 3, 2024

I will add a some change for fix memory leak. And think about creating BlockingParallelIterable instead of change ParallelIterable.

I add a new pr just fix memory leak. See #9402. @rdblue

@findepi
Copy link
Member

findepi commented Jul 12, 2024

I created a PR aiming to make the queue bounded, but without requiring separate executor pool. The change is effectively transparent to class consumers. Please see #10691 and let me know what you think of that approach.

@github-actions
Copy link

github-actions bot commented Sep 3, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Sep 3, 2024
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Manifests reader queue in ParallelIterable is unlimited caused OOM

7 participants