Frameworks and Systems for Distributed RL¶
This page surveys the major open-source frameworks for distributed RL, helping you choose the right tool for your research or application.
Framework Landscape¶
| Framework | Maintainer | Architecture | Best For |
|---|---|---|---|
| RLlib | Anyscale (Ray) | Ray-based distributed | General purpose, multi-agent |
| Acme | Google DeepMind | Actor-learner (Launchpad) | Research, modular components |
| CleanRL | Individual | Single-file implementations | Learning, prototyping |
| Stable-Baselines3 | Community | Vectorized envs | Simple use, benchmarking |
| Sample Factory | Individual | Async vectorized | High throughput, single machine |
| EnvPool | Sea AI Lab | C++ vectorized envs | Ultra-fast environment stepping |
| TorchRL | Meta (PyTorch) | Composable primitives | PyTorch ecosystem |
| Isaac Lab | NVIDIA | GPU simulation + RL | Robotics, locomotion |
RLlib¶
RLlib is the most comprehensive distributed RL framework, built on top of the Ray distributed computing framework.
Key Features¶
- Supports most major RL algorithms (PPO, SAC, DQN, IMPALA, etc.)
- Transparent distribution via Ray (scale from laptop to cluster)
- Multi-agent RL support (independent, centralized, parameter sharing)
- Custom models, environments, and algorithms
- Integration with Ray Tune for hyperparameter search
Architecture¶
graph TD
RC[Ray Cluster] --> RW1[Rollout Worker 1]
RC --> RW2[Rollout Worker 2]
RC --> RWN[Rollout Worker N]
RC --> T[Trainer / Learner]
RW1 -->|samples| T
RW2 -->|samples| T
RWN -->|samples| T
T -->|updated policy| RW1
T -->|updated policy| RW2
T -->|updated policy| RWN
When to Use¶
- Multi-agent RL
- Need to scale to many machines
- Want built-in algorithm implementations
- Production deployment
Acme¶
Acme (Google DeepMind) provides modular, composable RL components with a clean separation between algorithms and distributed infrastructure.
Design Philosophy¶
- Actors: Interact with environment, collect data
- Learners: Update model parameters from data
- Datasets/Adders: Move data between actors and learners
- Networks: Neural network architectures
These components are connected via Launchpad, which handles distribution.
Key Features¶
- Clean, modular code (great for understanding algorithms)
- Easy to go from single-process to distributed
- Strong JAX support (alongside TensorFlow)
- Reference implementations of many algorithms
When to Use¶
- Research requiring clean, modifiable algorithm implementations
- JAX-based projects
- Studying algorithm internals
CleanRL¶
CleanRL takes a radically different approach: single-file implementations of RL algorithms with no abstraction layers.
Philosophy¶
- Each algorithm in one self-contained Python file
- No inheritance, minimal abstraction
- Extensive logging (Weights & Biases integration)
- Reproducible benchmarks
When to Use¶
- Learning RL: See exactly how algorithms work
- Prototyping: Quick modifications without framework overhead
- Benchmarking: Standardized, reproducible implementations
- Starting point: Fork a single file and modify
Stable-Baselines3¶
The most user-friendly RL library, focusing on ease of use and reliability.
Key Features¶
- Clean, well-tested implementations of PPO, SAC, TD3, DQN, A2C
- Consistent API across algorithms
- Built-in vectorized environments
- Extensive documentation and tutorials
When to Use¶
- First RL project
- Quick prototyping
- Standard benchmarking
- Need reliable implementations without customization
Sample Factory¶
Sample Factory maximizes single-machine throughput through careful system design.
Key Innovations¶
- Asynchronous actor and learner on the same machine
- Ring buffer for zero-copy data passing
- Batched environment stepping
- Achieves 100K+ FPS on Atari on a single GPU machine
When to Use¶
- Maximum throughput on a single machine
- Complex 3D environments (VizDoom, DMLab)
- Don't want to manage a cluster
EnvPool¶
EnvPool provides blazing-fast environment simulation by implementing environments in C++ with async batching.
Key Features¶
- C++ environment implementations (Atari, MuJoCo, classic control)
- Asynchronous environment stepping
- 10-20x faster than Python Gym environments
- Compatible with any RL framework
Architecture¶
graph LR
PY[Python RL Code] -->|batch action| EP[EnvPool<br/>C++ Thread Pool]
EP -->|batch obs, reward| PY
subgraph EnvPool
E1[Env 1]
E2[Env 2]
EN[Env N]
end
When to Use¶
- Environment stepping is your bottleneck
- Using standard environments (Atari, MuJoCo)
- Want to drop-in replace Gym with minimal code changes
TorchRL¶
TorchRL (Meta) provides composable RL building blocks in the PyTorch ecosystem.
Key Features¶
TensorDict— unified data carrier for RL- Composable transforms, loss modules, and data collectors
- Integration with torchvision, torchtext
- GPU-accelerated replay buffers
When to Use¶
- PyTorch ecosystem (integrating with other PyTorch libraries)
- Custom algorithm development
- Need fine-grained control over RL components
Isaac Lab (formerly Isaac Gym + Orbit)¶
Isaac Lab is NVIDIA's framework for robot learning, combining GPU-accelerated physics with RL.
Key Features¶
- Thousands of parallel environments on a single GPU
- Built-in robot models (quadrupeds, humanoids, arms)
- Terrain generation, domain randomization
- Integration with rl_games, RSL_rl for fast PPO training
When to Use¶
- Robot locomotion / manipulation research
- Need massive parallel simulation
- Sim-to-real pipeline
Choosing a Framework¶
graph TD
Q1{Need distributed<br/>multi-machine?} -->|Yes| Q2{Multi-agent?}
Q1 -->|No| Q3{Robotics/GPU sim?}
Q2 -->|Yes| RLlib
Q2 -->|No| Q4{Large model?}
Q4 -->|Yes| Acme
Q4 -->|No| RLlib
Q3 -->|Yes| Isaac[Isaac Lab]
Q3 -->|No| Q5{Learning/prototyping?}
Q5 -->|Yes| Q6{Want simplicity?}
Q5 -->|No| Q7{Max throughput?}
Q6 -->|Single file| CleanRL
Q6 -->|User friendly| SB3[Stable-Baselines3]
Q7 -->|Yes| SF[Sample Factory]
Q7 -->|No| TRL[TorchRL]
Key References¶
- Liang, E., et al. (2018). "RLlib: Abstractions for Distributed Reinforcement Learning." ICML.
- Hoffman, M., et al. (2020). "Acme: A Research Framework for Distributed Reinforcement Learning." arXiv:2006.00979.
- Huang, S., et al. (2022). "CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms." JMLR.
- Raffin, A., et al. (2021). "Stable-Baselines3: Reliable Reinforcement Learning Implementations." JMLR.
- Petrenko, A., et al. (2020). "Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS." ICML.
- Weng, J., et al. (2022). "EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine." NeurIPS.