Legacy Concept Lab

World Models & Model-Based RL

Sample-efficient RL: learn from imagined experience, not just real data

Concept 76 of 100TheoryPhase 10
#76World ModelsTheory
key equation\hat{s}_{t+1} = f_\theta(s_t, a_t)
Phase 10: Mathematical foundations & information geometryConcept 76 of 100

Why It Matters for Modern Models

  • Sample-efficient RL: learn from imagined experience, not just real data
  • DreamerV3 achieves superhuman Atari with 100× less data than model-free methods
  • Foundation for planning-based AI: simulate futures before acting

What Tutorials Skip

What is still poorly explained in textbooks and papers:

  • World models let agents "imagine" consequences without taking real actions
  • Latent space prediction is easier than pixel prediction—compress, then predict
  • Model error compounds over long horizons—need careful uncertainty handling

Interactive Visualization

Core Math (Optional Deep Dive)

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation
s^t+1=fθ(st,at)\hat{s}_{t+1} = f_\theta(s_t, a_t)

A world model learns to predict future states:

s^t+1=fθ(st,at)\hat{s}_{t+1} = f_\theta(s_t, a_t)

Latent world model (DreamerV3):

  • Encoder: zt=enc(ot)z_t = \text{enc}(o_t)
  • Dynamics: z^t+1=dyn(zt,at)\hat{z}_{t+1} = \text{dyn}(z_t, a_t)
  • Decoder: o^t+1=dec(z^t+1)\hat{o}_{t+1} = \text{dec}(\hat{z}_{t+1})

Planning in imagination:

a=argmaxaEs^fθ[t=0Hγtr(s^t,at)]a^* = \arg\max_a \mathbb{E}_{\hat{s} \sim f_\theta}\left[\sum_{t=0}^H \gamma^t r(\hat{s}_t, a_t)\right]

Train policy entirely in the learned model ("dreaming").

Canonical Papers

World Models

Ha & Schmidhuber2018NeurIPS
Read paper →

Mastering Diverse Domains through World Models

Hafner et al.2023arXiv
Read paper →

Connections

Prerequisites

Next Moves

Explore this concept from different angles — like a mathematician would.