Skip to content

Frameworks and Systems for Distributed RL

This page surveys the major open-source frameworks for distributed RL, helping you choose the right tool for your research or application.

Framework Landscape

Framework Maintainer Architecture Best For
RLlib Anyscale (Ray) Ray-based distributed General purpose, multi-agent
Acme Google DeepMind Actor-learner (Launchpad) Research, modular components
CleanRL Individual Single-file implementations Learning, prototyping
Stable-Baselines3 Community Vectorized envs Simple use, benchmarking
Sample Factory Individual Async vectorized High throughput, single machine
EnvPool Sea AI Lab C++ vectorized envs Ultra-fast environment stepping
TorchRL Meta (PyTorch) Composable primitives PyTorch ecosystem
Isaac Lab NVIDIA GPU simulation + RL Robotics, locomotion

RLlib

RLlib is the most comprehensive distributed RL framework, built on top of the Ray distributed computing framework.

Key Features

  • Supports most major RL algorithms (PPO, SAC, DQN, IMPALA, etc.)
  • Transparent distribution via Ray (scale from laptop to cluster)
  • Multi-agent RL support (independent, centralized, parameter sharing)
  • Custom models, environments, and algorithms
  • Integration with Ray Tune for hyperparameter search

Architecture

graph TD
    RC[Ray Cluster] --> RW1[Rollout Worker 1]
    RC --> RW2[Rollout Worker 2]
    RC --> RWN[Rollout Worker N]
    RC --> T[Trainer / Learner]
    RW1 -->|samples| T
    RW2 -->|samples| T
    RWN -->|samples| T
    T -->|updated policy| RW1
    T -->|updated policy| RW2
    T -->|updated policy| RWN

When to Use

  • Multi-agent RL
  • Need to scale to many machines
  • Want built-in algorithm implementations
  • Production deployment

Acme

Acme (Google DeepMind) provides modular, composable RL components with a clean separation between algorithms and distributed infrastructure.

Design Philosophy

  • Actors: Interact with environment, collect data
  • Learners: Update model parameters from data
  • Datasets/Adders: Move data between actors and learners
  • Networks: Neural network architectures

These components are connected via Launchpad, which handles distribution.

Key Features

  • Clean, modular code (great for understanding algorithms)
  • Easy to go from single-process to distributed
  • Strong JAX support (alongside TensorFlow)
  • Reference implementations of many algorithms

When to Use

  • Research requiring clean, modifiable algorithm implementations
  • JAX-based projects
  • Studying algorithm internals

CleanRL

CleanRL takes a radically different approach: single-file implementations of RL algorithms with no abstraction layers.

Philosophy

  • Each algorithm in one self-contained Python file
  • No inheritance, minimal abstraction
  • Extensive logging (Weights & Biases integration)
  • Reproducible benchmarks

When to Use

  • Learning RL: See exactly how algorithms work
  • Prototyping: Quick modifications without framework overhead
  • Benchmarking: Standardized, reproducible implementations
  • Starting point: Fork a single file and modify

Stable-Baselines3

The most user-friendly RL library, focusing on ease of use and reliability.

Key Features

  • Clean, well-tested implementations of PPO, SAC, TD3, DQN, A2C
  • Consistent API across algorithms
  • Built-in vectorized environments
  • Extensive documentation and tutorials

When to Use

  • First RL project
  • Quick prototyping
  • Standard benchmarking
  • Need reliable implementations without customization

Sample Factory

Sample Factory maximizes single-machine throughput through careful system design.

Key Innovations

  • Asynchronous actor and learner on the same machine
  • Ring buffer for zero-copy data passing
  • Batched environment stepping
  • Achieves 100K+ FPS on Atari on a single GPU machine

When to Use

  • Maximum throughput on a single machine
  • Complex 3D environments (VizDoom, DMLab)
  • Don't want to manage a cluster

EnvPool

EnvPool provides blazing-fast environment simulation by implementing environments in C++ with async batching.

Key Features

  • C++ environment implementations (Atari, MuJoCo, classic control)
  • Asynchronous environment stepping
  • 10-20x faster than Python Gym environments
  • Compatible with any RL framework

Architecture

graph LR
    PY[Python RL Code] -->|batch action| EP[EnvPool<br/>C++ Thread Pool]
    EP -->|batch obs, reward| PY
    subgraph EnvPool
        E1[Env 1]
        E2[Env 2]
        EN[Env N]
    end

When to Use

  • Environment stepping is your bottleneck
  • Using standard environments (Atari, MuJoCo)
  • Want to drop-in replace Gym with minimal code changes

TorchRL

TorchRL (Meta) provides composable RL building blocks in the PyTorch ecosystem.

Key Features

  • TensorDict — unified data carrier for RL
  • Composable transforms, loss modules, and data collectors
  • Integration with torchvision, torchtext
  • GPU-accelerated replay buffers

When to Use

  • PyTorch ecosystem (integrating with other PyTorch libraries)
  • Custom algorithm development
  • Need fine-grained control over RL components

Isaac Lab (formerly Isaac Gym + Orbit)

Isaac Lab is NVIDIA's framework for robot learning, combining GPU-accelerated physics with RL.

Key Features

  • Thousands of parallel environments on a single GPU
  • Built-in robot models (quadrupeds, humanoids, arms)
  • Terrain generation, domain randomization
  • Integration with rl_games, RSL_rl for fast PPO training

When to Use

  • Robot locomotion / manipulation research
  • Need massive parallel simulation
  • Sim-to-real pipeline

Choosing a Framework

graph TD
    Q1{Need distributed<br/>multi-machine?} -->|Yes| Q2{Multi-agent?}
    Q1 -->|No| Q3{Robotics/GPU sim?}
    Q2 -->|Yes| RLlib
    Q2 -->|No| Q4{Large model?}
    Q4 -->|Yes| Acme
    Q4 -->|No| RLlib
    Q3 -->|Yes| Isaac[Isaac Lab]
    Q3 -->|No| Q5{Learning/prototyping?}
    Q5 -->|Yes| Q6{Want simplicity?}
    Q5 -->|No| Q7{Max throughput?}
    Q6 -->|Single file| CleanRL
    Q6 -->|User friendly| SB3[Stable-Baselines3]
    Q7 -->|Yes| SF[Sample Factory]
    Q7 -->|No| TRL[TorchRL]

Key References

  • Liang, E., et al. (2018). "RLlib: Abstractions for Distributed Reinforcement Learning." ICML.
  • Hoffman, M., et al. (2020). "Acme: A Research Framework for Distributed Reinforcement Learning." arXiv:2006.00979.
  • Huang, S., et al. (2022). "CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms." JMLR.
  • Raffin, A., et al. (2021). "Stable-Baselines3: Reliable Reinforcement Learning Implementations." JMLR.
  • Petrenko, A., et al. (2020). "Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS." ICML.
  • Weng, J., et al. (2022). "EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine." NeurIPS.