8Generative Models

GANs & Adversarial Divergence Minimization

Canonical Papers

Generative Adversarial Nets

Goodfellow et al.2014NeurIPS
Read paper →

Wasserstein GAN

Arjovsky et al.2017ICML
Read paper →

Core Mathematics

Original GAN objective:

minGmaxD  Expdata[logD(x)]+Ezp(z)[log(1D(G(z)))]\min_G \max_D \; \mathbb E_{x\sim p_{\text{data}}}[\log D(x)] + \mathbb E_{z\sim p(z)}[\log(1 - D(G(z)))]

At optimum, with optimal discriminator DD^*, this minimizes the Jensen–Shannon divergence between model and data.

WGAN replaces JS with Earth-Mover (Wasserstein-1) distance, with Lipschitz constraints on DD.

Key Equation
minGmaxD  Epdata[logD(x)]+Ep(z)[log(1D(G(z)))]\min_G \max_D \; \mathbb E_{p_{\text{data}}}[\log D(x)] + \mathbb E_{p(z)}[\log(1 - D(G(z)))]

Interactive Visualization

Why It Matters for Modern Models

  • Adversarial min-max ideas appear in adversarial training and some alignment techniques
  • GAN-like training still influential in high-fidelity image/video generation

Missing Intuition

What is still poorly explained in textbooks and papers:

  • Why JS divergence leads to vanishing gradients when supports don't overlap, and how Wasserstein distances fix this
  • Geometric visualizations of discriminator decision surfaces over latent manifolds

Connections

Prerequisites

Next Moves

Explore this concept from different angles — like a mathematician would.