Legacy Concept Lab

World Models & Model-Based RL

Sample-efficient RL: learn from imagined experience, not just real data

Concept 76 of 100TheoryPhase 10

#76World ModelsTheory

key equation\hat{s}_{t+1} = f_\theta(s_t, a_t)

Phase 10: Mathematical foundations & information geometryConcept 76 of 100

Why It Matters for Modern Models

What is still poorly explained in textbooks and papers:

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation

\hat{s}_{t+1} = f_\theta(s_t, a_t)

A world model learns to predict future states:

\hat{s}_{t+1} = f_\theta(s_t, a_t)

Latent world model (DreamerV3):

Planning in imagination:

a^* = \arg\max_a \mathbb{E}_{\hat{s} \sim f_\theta}\left[\sum_{t=0}^H \gamma^t r(\hat{s}_t, a_t)\right]

Train policy entirely in the learned model ("dreaming").

Ha & Schmidhuber2018NeurIPS

Hafner et al.2023arXiv

Explore this concept from different angles — like a mathematician would.