Legacy Concept Lab

Bregman Divergence & Mirror Descent

Explains why different geometries suit different problems—simplex needs KL, not Euclidean

Concept 70 of 100TheoryPhase 10
#70BregmanTheory
key equationD_\Phi(p, q) = \Phi(p) - \Phi(q) - \langle \nabla\Phi(q), p - q \rangle
Phase 10: Mathematical foundations & information geometryConcept 70 of 100

Why It Matters for Modern Models

  • Explains why different geometries suit different problems—simplex needs KL, not Euclidean
  • Exponentiated gradient (softmax updates) is mirror descent with entropy potential
  • Natural gradient is Bregman geometry with Fisher information as the potential

What Tutorials Skip

What is still poorly explained in textbooks and papers:

  • Euclidean gradient descent is ONE choice; mirror descent is the general framework
  • The "mirror map" transforms to dual coordinates where steps are linear
  • KL divergence is the Bregman divergence for probability distributions

Interactive Visualization

Core Math (Optional Deep Dive)

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation
DΦ(p,q)=Φ(p)Φ(q)Φ(q),pqD_\Phi(p, q) = \Phi(p) - \Phi(q) - \langle \nabla\Phi(q), p - q \rangle

Bregman divergence generated by strictly convex Φ\Phi:

DΦ(p,q)=Φ(p)Φ(q)Φ(q),pqD_\Phi(p, q) = \Phi(p) - \Phi(q) - \langle \nabla\Phi(q), p - q \rangle

Mirror descent:

xt+1=argminxX{gt,x+1ηDΦ(x,xt)}x_{t+1} = \arg\min_{x \in \mathcal{X}} \left\{ \langle g_t, x \rangle + \frac{1}{\eta} D_\Phi(x, x_t) \right\}

Special cases:

  • Φ(x)=12x2\Phi(x) = \frac{1}{2}\|x\|^2 → Euclidean GD, DΦD_\Phi = squared distance
  • Φ(x)=ixilogxi\Phi(x) = \sum_i x_i \log x_i → KL divergence on simplex (exponentiated gradient)

Canonical Papers

Mirror Descent and Nonlinear Projected Subgradient Methods

Beck & Teboulle2003Operations Research Letters
Read paper →

Prediction, Learning, and Games

Cesa-Bianchi & Lugosi2006Cambridge University Press
Read paper →

Connections

Next Moves

Explore this concept from different angles — like a mathematician would.