9Generative Models

Diffusion, Score-Based Models & Flow Matching

Canonical Papers

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Sohl-Dickstein et al.2015ICML
Read paper →

Denoising Diffusion Probabilistic Models

Ho et al.2020NeurIPS
Read paper →

Score-Based Generative Modeling through SDEs

Song et al.2021ICLR
Read paper →

Flow Matching for Generative Modeling

Lipman et al.2023ICLR
Read paper →

Core Mathematics

Forward diffusion adds noise:

q(xtx0)=N(αˉtx0,(1αˉt)I)q(x_t \mid x_0) = \mathcal N(\sqrt{\bar\alpha_t}\,x_0, (1-\bar\alpha_t)I)

Model learns to predict noise ϵ\epsilon via MSE:

L=Ex0,t,ϵϵϵθ(xt,t)2\mathcal L = \mathbb E_{x_0,t,\epsilon} \big\|\epsilon - \epsilon_\theta(x_t,t)\big\|^2

Score-based SDE view: forward SDE dxt=f(xt,t)dt+g(t)dWtdx_t = f(x_t,t)\,dt + g(t)\,dW_t. Reverse-time SDE uses score xlogpt(x)\nabla_x \log p_t(x).

Flow matching: train vector field vθ(x,t)v_\theta(x,t) to match the "true" conditional field (often optimal transport / straight lines).

Key Equation
L=Ex0,t,ϵϵϵθ(xt,t)2\mathcal L = \mathbb E_{x_0,t,\epsilon} \big\|\epsilon - \epsilon_\theta(x_t,t)\big\|^2

Interactive Visualization

Why It Matters for Modern Models

  • Stable Diffusion: latent diffusion — DDPM in a VAE latent space
  • Sora: diffusion transformer over 3D spacetime patches
  • Flow-matching and rectified flows enable one-step or few-step generation

Missing Intuition

What is still poorly explained in textbooks and papers:

  • Intuitive explanation that denoising is learning ∇ₓ log pₜ(x) (scores), and how reverse-time SDE sampling corresponds to "walking uphill in log-density space"
  • Visual/interactive demonstrations of different probability paths (diffusion vs optimal transport)

Connections