7Generative Models

ℤVariational Autoencoders & Variational Inference

Canonical Papers

Auto-Encoding Variational Bayes

Kingma & Welling2013ICLR

Core Mathematics

Latent variable model $p_\theta(x,z) = p(z)p_\theta(x\mid z)$ with intractable posterior. Introduce variational encoder $q_\phi(z\mid x)$ and maximize ELBO:

\log p_\theta(x) \ge \mathbb E_{q_\phi(z\mid x)}[\log p_\theta(x\mid z)] - \mathrm{KL}(q_\phi(z\mid x)\,\|\,p(z))

Reparameterization trick for Gaussian encoder:

z = \mu_\phi(x) + \sigma_\phi(x)\odot\epsilon,\quad \epsilon\sim\mathcal N(0,I)

Key Equation

\log p_\theta(x) \ge \mathbb E_{q_\phi}[\log p_\theta(x\mid z)] - \mathrm{KL}(q_\phi\,\|\,p(z))

Interactive Visualization

Why It Matters for Modern Models

Stable Diffusion is a latent diffusion model: an autoencoder maps images ↔ compressed latent space where diffusion operates
VAEs underpin many multimodal encoders (audio, video latents) used as building blocks

Missing Intuition

What is still poorly explained in textbooks and papers:

Intuitive grasp of why ELBO works as both reconstruction + regularization
Visualizations of how the prior p(z) and posterior families affect sample quality/diversity

Connections

Prerequisites

ℒML/CE/KL

Enables

∂Diffusion

Next Moves

Explore this concept from different angles — like a mathematician would.

Semantic Connections

↔️ Mathematical Dual

∂Score ↔ ELBO→Diffusion ⚔Implicit vs explicit density→GANs

🔧 Invented to Fix

∂Blurry samples→Diffusion ∂Fix blurry outputs→Diffusion

🔄 Same Technique

∂Reparameterized noise→Diffusion ℒKL divergence→ML/CE/KL