Legacy Concept Lab

Optimal Transport & Wasserstein Distance

Wasserstein distance explains why WGANs are more stable than vanilla GANs—it provides gradients even when distributions do not overlap

Concept 38 of 100TheoryPhase 8

#38OT/WassersteinTheory

key equationW_p(\mu, \nu) = \left( \inf_{\gamma \in \Gamma(\mu, \nu)} \int \|x - y\|^p \, d\gamma(x, y) \right)^{1/p}

Phase 8: Scaling, theory & multimodalConcept 38 of 100

Why It Matters for Modern Models

Wasserstein distance explains why WGANs are more stable than vanilla GANs—it provides gradients even when distributions do not overlap
Flow matching and rectified flows are built on OT: they learn the optimal transport map directly
OT provides a principled way to measure distance between distributions that respects geometry

What is still poorly explained in textbooks and papers:

KL divergence is infinite when supports do not overlap; Wasserstein is finite and measures "how far to move mass"
The "earth mover" intuition: imagine distributions as piles of dirt, OT finds the cheapest way to reshape one into the other
Entropic regularization (Sinkhorn) makes OT tractable: adds −εH(γ) to get differentiable, GPU-friendly algorithms

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation

W_p(\mu, \nu) = \left( \inf_{\gamma \in \Gamma(\mu, \nu)} \int \|x - y\|^p \, d\gamma(x, y) \right)^{1/p}

Optimal transport finds the minimum-cost way to move mass from distribution $\mu$ to $\nu$ :

W_p(\mu, \nu) = \left( \inf_{\gamma \in \Gamma(\mu, \nu)} \int \|x - y\|^p \, d\gamma(x, y) \right)^{1/p}

where $\Gamma(\mu, \nu)$ is the set of joint distributions with marginals $\mu, \nu$ .

Kantorovich duality (key for computation):

W_1(\mu, \nu) = \sup_{\|f\|_L \leq 1} \mathbb{E}_{x \sim \mu}[f(x)] - \mathbb{E}_{y \sim \nu}[f(y)]

Brenier's theorem: For absolutely continuous $\mu$ , the optimal map is $T = \nabla \phi$ for convex $\phi$ .

Peyré & Cuturi2019Foundations and Trends in ML

Arjovsky, Chintala, Bottou2017ICML

Explore this concept from different angles — like a mathematician would.