9Generative Models

∂Diffusion, Score-Based Models & Flow Matching

Canonical Papers

Sohl-Dickstein et al.2015ICML

Ho et al.2020NeurIPS

Song et al.2021ICLR

Lipman et al.2023ICLR

Forward diffusion adds noise:

q(x_t \mid x_0) = \mathcal N(\sqrt{\bar\alpha_t}\,x_0, (1-\bar\alpha_t)I)

Model learns to predict noise $\epsilon$ via MSE:

\mathcal L = \mathbb E_{x_0,t,\epsilon} \big\|\epsilon - \epsilon_\theta(x_t,t)\big\|^2

Score-based SDE view: forward SDE $dx_t = f(x_t,t)\,dt + g(t)\,dW_t$ . Reverse-time SDE uses score $\nabla_x \log p_t(x)$ .

Flow matching: train vector field $v_\theta(x,t)$ to match the "true" conditional field (often optimal transport / straight lines).

Key Equation

\mathcal L = \mathbb E_{x_0,t,\epsilon} \big\|\epsilon - \epsilon_\theta(x_t,t)\big\|^2

What is still poorly explained in textbooks and papers:

Intuitive explanation that denoising is learning ∇ₓ log pₜ(x) (scores), and how reverse-time SDE sampling corresponds to "walking uphill in log-density space"
Visual/interactive demonstrations of different probability paths (diffusion vs optimal transport)

Explore this concept from different angles — like a mathematician would.

≈ Analogy

↔️ Mathematical Dual

🔧 Invented to Fix

🔄 Same Technique