Part I: Reinforcement Learning¶
Reinforcement Learning (RL) is the computational framework for learning through interaction. An agent takes actions in an environment, receives feedback in the form of rewards, and learns a policy that maximizes cumulative reward over time. RL provides the algorithmic foundation for much of embodied AI research.
What You'll Learn¶
This section covers:
- Key Concepts — MDPs, policies, value functions, Bellman equations, and the exploration-exploitation tradeoff
- Algorithm Taxonomy — A map of the RL algorithm landscape: model-free vs. model-based, on-policy vs. off-policy, value vs. policy methods
- Intro to Policy Optimization — The policy gradient theorem and why it matters
- Algorithm Deep-Dives:
- Policy Gradient Methods — REINFORCE, Vanilla Policy Gradient
- Trust Region Methods — TRPO, PPO
- Value-Based Methods — DQN, Double DQN, Dueling DQN, Rainbow
- Actor-Critic Methods — A2C/A3C, DDPG, TD3, SAC
- Model-Based RL — Dyna, MBPO, Dreamer
- Offline RL — BCQ, CQL, IQL, Decision Transformer
- Key Papers — Curated reading list of foundational and recent RL papers
Recommended Reading Order¶
If you are new to RL, we suggest:
graph TD
A[Key Concepts] --> B[Algorithm Taxonomy]
B --> C[Intro to Policy Optimization]
C --> D[Policy Gradient]
C --> E[Value-Based]
D --> F[Trust Region - TRPO/PPO]
E --> G[Actor-Critic - SAC/TD3]
F --> G
G --> H[Model-Based RL]
H --> I[Offline RL]
I --> J[Key Papers]
If you already know the basics, feel free to jump directly to any algorithm page.
Connection to Other Sections¶
- World Models (Part II) extends the model-based RL concepts from this section
- Embodied AI (Part III) applies RL algorithms to physical robot systems (sim-to-real, locomotion policies)
- Distributed RL (Part IV) covers how to scale the training of these algorithms