Legacy Concept Lab

Score Matching & Score-Based Generative Models

Score functions are the mathematical foundation of diffusion models—the denoiser learns the score at each noise level

Concept 36 of 100Generative ModelsPhase 4

#36Score MatchingGenerative Models

key equations(x) = \nabla_x \log p(x)

Phase 4: Generative modeling familiesConcept 36 of 100

Why It Matters for Modern Models

Score functions are the mathematical foundation of diffusion models—the denoiser learns the score at each noise level
Explains why diffusion training is "just regression": predict noise ε, which equals -σ × score
Unifies VAEs, diffusion, and energy-based models through the lens of learning ∇log p(x)

What is still poorly explained in textbooks and papers:

The score is a vector field pointing "uphill" toward higher density—sampling follows this flow backward from noise
Why denoising works: optimal denoiser predicts E[x|x̃], and its gradient w.r.t. x̃ gives the score
Score matching avoids computing intractable partition functions—you only need gradients, not absolute probabilities

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation

s(x) = \nabla_x \log p(x)

The score function is the gradient of log-density:

s(x) = \nabla_x \log p(x)

Score matching learns $s_\theta(x) \approx \nabla_x \log p_{\text{data}}(x)$ without knowing the normalizing constant:

\mathcal{L}_{SM} = \mathbb{E}_{p_{\text{data}}}\left[ \frac{1}{2}\|s_\theta(x)\|^2 + \text{tr}(\nabla_x s_\theta(x)) \right]

Denoising score matching (practical form):

\mathcal{L}_{DSM} = \mathbb{E}_{x, \tilde{x}}\left[ \|s_\theta(\tilde{x}) - \nabla_{\tilde{x}} \log q(\tilde{x}|x)\|^2 \right]

For Gaussian noise $\tilde{x} = x + \sigma\epsilon$ , the optimal score is $-\epsilon/\sigma$ .

Hyvärinen2005JMLR

Song & Ermon2019NeurIPS

Explore this concept from different angles — like a mathematician would.