Legacy Concept Lab
Energy-Based Models & Score Functions
Unifies discriminative and generative modeling: classifier logits ARE energy differences
#53EBMsGenerative Models
key equation
p_\theta(x) = \frac{\exp(-E_\theta(x))}{Z_\theta}Phase 10: Mathematical foundations & information geometryConcept 53 of 100
Why It Matters for Modern Models
- Unifies discriminative and generative modeling: classifier logits ARE energy differences
- GAN discriminators can be viewed as learning energy functions
- Score-based diffusion models are EBMs trained via denoising score matching
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- The partition function Z is intractable—all EBM training tricks avoid computing it
- Energy = "how wrong this input looks"—low energy = high probability
- MCMC sampling from EBMs is slow; diffusion sidesteps this by learning the denoising path directly
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
Energy-based models define probability via unnormalized energy:
The score function is the gradient of log-probability:
Contrastive divergence training: