Legacy Concept Lab

Video World Models

Merges generative modeling with dynamics modeling

Concept 97 of 100Generative ModelsPhase 13
#97VideoWMGenerative Models
key equationp_\theta(x_{1:T}|c) = \prod_t p_\theta(x_t | x_{<t}, c)
Phase 13: Cutting-edge 2024-2025 researchConcept 97 of 100

Why It Matters for Modern Models

  • Merges generative modeling with dynamics modeling
  • Precursor to general planning/agents
  • Sora shows emergence of 3D consistency, object permanence

What Tutorials Skip

What is still poorly explained in textbooks and papers:

  • Not just "video generation" but learned physics engine
  • Emergent properties: camera control, object tracking, causality
  • Can imagine "what happens if" for planning

Interactive Visualization

Core Math (Optional Deep Dive)

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation
pθ(x1:Tc)=tpθ(xtx<t,c)p_\theta(x_{1:T}|c) = \prod_t p_\theta(x_t | x_{<t}, c)

Video as learned dynamics. Autoregressive:

pθ(x1:Tc)=t=1Tpθ(xtx<t,c)p_\theta(x_{1:T}|c) = \prod_{t=1}^T p_\theta(x_t | x_{<t}, c)

Diffusion over latent zz:

minθEt,ϵ[ϵϵθ(zt,t,c)2]\min_\theta \mathbb{E}_{t,\epsilon}[\|\epsilon - \epsilon_\theta(z_t, t, c)\|^2]

Video generators = learned simulators of physical world.

Canonical Papers

Video generation models as world simulators

OpenAI2024OpenAI
Read paper →

Connections

Prerequisites

Next Moves

Explore this concept from different angles — like a mathematician would.