-
Notifications
You must be signed in to change notification settings - Fork 203
Description
Hello everyone! Thank you for your attention to ROLL.
ROLL has recently updated with many new features, with ROLL Flash as the core. ROLL Flash achieves significant improvements in training efficiency through its innovative asynchronous training architecture.
Below is a summary of the code updates. We will continue to iterate and improve ROLL. Welcome to join the ROLL community.
🚀 Core Highlights
- Asynchronous Training Architecture: Brand-new asynchronous generation scheduler enabling efficient pipeline overlap between generation, reward calculation, and model training
- Significant Performance Improvements: Up to 2.24× speedup in RLVR tasks and up to 2.72× speedup in Agentic tasks
- Near-linear Scaling: Maintains near-linear throughput scaling at hundred-card scale, with 8× GPU resources achieving 7.6× efficiency improvement
- Multi Off-policy Algorithm Support: Integrates multiple off-policy algorithms (Decoupled PPO, TOPR, CISPO, etc.) with performance comparable to synchronous training
🔧 Major New Features
-
ROLLFlash
- Asynchronous training:
docs_roll/docs/English/UserGuide/async_training.md - Queue Scheduling mechanism for independent task scheduling to maximize GPU utilization:
roll/distributed/scheduler/async_generate_scheduler.py - Environment-Level Async Rollout to avoid GPU waiting for environment interactions:
docs_roll/docs/English/UserGuide/async_parallel_rollout.md - Redundant Environment Rollout capability to improve training robustness:
roll/pipeline/agentic/agentic_config.py:37 - Off-policy algorithms:
docs_roll/docs/English/UserGuide/algorithms/offpolicy_setting.md
- Asynchronous training:
-
Agentic
- Adjusted RolloutScheduler implementation for better control over EnvManager interactions:
docs_roll/docs/English/UserGuide/agentic/agentic_engineer_practice.md - GlobalDataset component for custom env use, avoiding network/memory bottlenecks from individual env data reading
- Code:
roll/datasets/global_dataset.py - Documentation:
docs_roll/docs/English/UserGuide/agentic/agentic_engineer_practice.md
- Code:
- Support for val dataset traversal configuration:
docs_roll/docs/English/UserGuide/agentic/agentic_engineer_practice.md - Support for trajectory synthesis dump capability:
docs_roll/docs/English/UserGuide/agentic/agentic_engineer_practice.md - Support for stateful trajectory filtering capability:
docs_roll/docs/English/UserGuide/agentic/agentic_engineer_practice.md
- Adjusted RolloutScheduler implementation for better control over EnvManager interactions:
-
Performance Optimization & Backend
- Dynamic batching optimization:
roll/utils/dynamic_batching.py - Optimized DistillPipeline to improve teacher-student logits transmission efficiency
- Added complete support for vLLM 0.11.0
- Dynamic batching optimization:
-
Documentation
- FP8 rollout configuration documentation:
docs_roll/docs/English/UserGuide/backend/fp8_rollout.md
- FP8 rollout configuration documentation:
Whether you are working on mathematical reasoning, code generation, or building real-world interactive LLM agents, ROLL Flash can help you train stronger models faster, more stably, and more cost-effectively.
The ROLL team will continue to deeply cultivate system and algorithm co-innovation for RL in LLM, dedicated to building an easy-to-use, efficient, and scalable open-source ecosystem.
Welcome to Star, try, and contribute code to advance LLM reinforcement learning toward practicality and large-scale deployment! 🌟