Part II: World Models¶

World models are learned representations of how the world works — capturing the dynamics, structure, and regularities of an environment. They enable agents to predict, plan, and imagine without direct interaction, forming a critical component of intelligent embodied systems.

What You'll Learn¶

This section covers:

What Are World Models? — Definitions, motivations, and the cognitive science perspective
Representation Learning — How to learn useful latent spaces for world modeling
Video Prediction — Predicting future visual observations
Planning with World Models — Using learned models for decision-making
Foundation World Models — Large-scale, general-purpose world models
Key Papers — Essential reading in world models

Why World Models?¶

The motivation for world models comes from multiple directions:

From RL: Model-based RL is 10-100x more sample-efficient than model-free approaches
From cognitive science: Humans constantly simulate the world mentally to plan and predict
From robotics: Real-world interaction is expensive; simulation with learned models is cheap
From scaling: Foundation world models are emerging as a path to general-purpose physical reasoning

Connection to Other Parts¶

graph LR
    MB[Model-Based RL<br/>Part I] --> WM[World Models<br/>Part II]
    WM --> EA[Embodied AI<br/>Part III]
    WM --> DR[Distributed RL<br/>Part IV]
    EA -->|data for<br/>training| WM

Part I (RL): World models generalize the dynamics models from model-based RL
Part III (Embodied AI): World models enable sim-to-real transfer and robot learning from imagination
Part IV (Distributed RL): Training large world models requires distributed systems