[vllm, rollout] fix: implement deferred ZMQ futures for non-blocking executor calls #4875

jreiml · 2026-01-10T17:43:05Z

What does this PR do?

Fixes the non_block=True behavior in ExternalZeroMQDistributedExecutor to properly implement the vLLM executor contract.

The previous implementation immediately called recv() and wrapped the result in an already-resolved Future. This PR adds a _DeferredZmqFuture that defers recv() until result() is called, allowing vLLM's EngineCore to overlap work (e.g., grammar bitmask computation for structured output) with remote model execution.

Related: #3934 added non_block parameter compatibility but didn't implement actual non-blocking behavior.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+non_block
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)

Test

The non_block=True code path is called by vLLM v1's EngineCore (see vllm/v1/engine/core.py). Existing CI tests that run vLLM v1 (tests/experimental/agent_loop, test_vllm_abort.py) exercise this code path through the full inference stack.

API and Usage Example

No API changes. Internal behavior change only.

Design & Code Changes

Add _DeferredZmqFuture class that stores sockets and defers recv() until result() is called
Add non_block and unique_reply_rank parameters to collective_rpc()
Add assertion enforcing max_concurrent_batches=1 (required for thread-safe ZMQ REQ/REP)

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation. N/A - internal change
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: Covered by existing vLLM v1 tests (tests/experimental/agent_loop, test_vllm_abort.py)
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

gemini-code-assist

Code Review

This pull request correctly implements the deferred execution for non-blocking calls in ExternalZeroMQDistributedExecutor by introducing _DeferredZmqFuture. This is a good fix that aligns with the vLLM executor contract and allows for overlapping computation. The addition of the assertion for max_concurrent_batches=1 is also a great defensive measure to ensure thread safety with ZMQ REQ/REP sockets. However, I've identified a critical security vulnerability. The implementation uses pickle.loads() to deserialize data received over the network. This is unsafe and can lead to remote code execution if the network is not completely secure. My review includes a comment with details on this issue.

gemini-code-assist · 2026-01-10T17:44:24Z

verl/workers/rollout/vllm_rollout/vllm_async_server.py

+            try:
+                outputs = []
+                for socket in self._sockets:
+                    outputs.append(pickle.loads(socket.recv()))


The use of pickle.loads() on data received from a network socket introduces a critical security vulnerability. Deserializing data with pickle can lead to arbitrary code execution if the data is crafted maliciously. While this communication is likely between trusted internal workers, it's a significant security risk if the network is not completely isolated and secure. An attacker who can intercept or inject traffic on this ZMQ channel could compromise the worker process.

It is strongly recommended to replace pickle with a safer serialization format, such as JSON. If complex Python objects must be transferred, consider using a library that provides cryptographically signed serialization to ensure data integrity and authenticity.

The current change follows the existing logic. This would have to be done in another PR.

- Add `_DeferredZmqFuture` class that defers ZMQ `recv()` until `result()` is called - This properly implements the vLLM executor contract for `non_block=True`, allowing EngineCore to overlap work (e.g., grammar bitmask computation for structured output) with remote model execution - Add assertion to enforce `max_concurrent_batches=1`, required for thread-safe ZMQ REQ/REP operation

jreiml requested review from PeterSH6, chenhaiq and wuxibin89 as code owners January 10, 2026 17:43

gemini-code-assist bot reviewed Jan 10, 2026

View reviewed changes

jreiml force-pushed the vllm-deferred-zmq-future branch from 7447b1f to 013a7a9 Compare January 10, 2026 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[vllm, rollout] fix: implement deferred ZMQ futures for non-blocking executor calls #4875

[vllm, rollout] fix: implement deferred ZMQ futures for non-blocking executor calls #4875

Uh oh!

jreiml commented Jan 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 10, 2026

Uh oh!

jreiml Jan 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[vllm, rollout] fix: implement deferred ZMQ futures for non-blocking executor calls #4875

Are you sure you want to change the base?

[vllm, rollout] fix: implement deferred ZMQ futures for non-blocking executor calls #4875

Uh oh!

Conversation

jreiml commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

jreiml Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jreiml commented Jan 10, 2026 •

edited

Loading

jreiml Jan 10, 2026 •

edited

Loading