7Generative Models

Variational Autoencoders & Variational Inference

Canonical Papers

Auto-Encoding Variational Bayes

Kingma & Welling2013ICLR
Read paper →

Core Mathematics

Latent variable model pθ(x,z)=p(z)pθ(xz)p_\theta(x,z) = p(z)p_\theta(x\mid z) with intractable posterior. Introduce variational encoder qϕ(zx)q_\phi(z\mid x) and maximize ELBO:

logpθ(x)Eqϕ(zx)[logpθ(xz)]KL(qϕ(zx)p(z))\log p_\theta(x) \ge \mathbb E_{q_\phi(z\mid x)}[\log p_\theta(x\mid z)] - \mathrm{KL}(q_\phi(z\mid x)\,\|\,p(z))

Reparameterization trick for Gaussian encoder:

z=μϕ(x)+σϕ(x)ϵ,ϵN(0,I)z = \mu_\phi(x) + \sigma_\phi(x)\odot\epsilon,\quad \epsilon\sim\mathcal N(0,I)
Key Equation
logpθ(x)Eqϕ[logpθ(xz)]KL(qϕp(z))\log p_\theta(x) \ge \mathbb E_{q_\phi}[\log p_\theta(x\mid z)] - \mathrm{KL}(q_\phi\,\|\,p(z))

Interactive Visualization

Why It Matters for Modern Models

  • Stable Diffusion is a latent diffusion model: an autoencoder maps images ↔ compressed latent space where diffusion operates
  • VAEs underpin many multimodal encoders (audio, video latents) used as building blocks

Missing Intuition

What is still poorly explained in textbooks and papers:

  • Intuitive grasp of why ELBO works as both reconstruction + regularization
  • Visualizations of how the prior p(z) and posterior families affect sample quality/diversity

Connections

Prerequisites

Next Moves

Explore this concept from different angles — like a mathematician would.