Legacy Concept Lab
Classifier-Free Guidance in Diffusion
CFG is why Stable Diffusion/DALL-E/Midjourney produce high-quality, on-prompt images
#42CFGGenerative Models
key equation
\tilde{\epsilon}_\theta = \epsilon_\theta(\emptyset) + w \cdot (\epsilon_\theta(c) - \epsilon_\theta(\emptyset))Phase 9: Advanced architectures & generationConcept 42 of 100
Why It Matters for Modern Models
- CFG is why Stable Diffusion/DALL-E/Midjourney produce high-quality, on-prompt images
- The guidance scale is the main user-facing knob for text-to-image quality vs diversity
- Trains one model that handles both conditional and unconditional generation via dropout on conditioning
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- CFG extrapolates beyond the data distribution—high guidance can produce unrealistic but more "prompt-adherent" images
- There is an optimal guidance scale: too low = ignores prompt, too high = artifacts and oversaturation
- CFG relates to temperature in LLMs: both are post-hoc distribution shaping at inference time
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
CFG interpolates between conditional and unconditional scores:
where is the guidance scale (typically 3-15 for text-to-image).
Equivalently in score space:
Higher amplifies the conditioning signal, trading diversity for fidelity.