Skip to content

Part I: Reinforcement Learning

Reinforcement Learning (RL) is the computational framework for learning through interaction. An agent takes actions in an environment, receives feedback in the form of rewards, and learns a policy that maximizes cumulative reward over time. RL provides the algorithmic foundation for much of embodied AI research.

What You'll Learn

This section covers:

  1. Key Concepts — MDPs, policies, value functions, Bellman equations, and the exploration-exploitation tradeoff
  2. Algorithm Taxonomy — A map of the RL algorithm landscape: model-free vs. model-based, on-policy vs. off-policy, value vs. policy methods
  3. Intro to Policy Optimization — The policy gradient theorem and why it matters
  4. Algorithm Deep-Dives:
  5. Key Papers — Curated reading list of foundational and recent RL papers

If you are new to RL, we suggest:

graph TD
    A[Key Concepts] --> B[Algorithm Taxonomy]
    B --> C[Intro to Policy Optimization]
    C --> D[Policy Gradient]
    C --> E[Value-Based]
    D --> F[Trust Region - TRPO/PPO]
    E --> G[Actor-Critic - SAC/TD3]
    F --> G
    G --> H[Model-Based RL]
    H --> I[Offline RL]
    I --> J[Key Papers]

If you already know the basics, feel free to jump directly to any algorithm page.

Connection to Other Sections

  • World Models (Part II) extends the model-based RL concepts from this section
  • Embodied AI (Part III) applies RL algorithms to physical robot systems (sim-to-real, locomotion policies)
  • Distributed RL (Part IV) covers how to scale the training of these algorithms