Legacy Concept Lab
Bregman Divergence & Mirror Descent
Explains why different geometries suit different problems—simplex needs KL, not Euclidean
#70BregmanTheory
key equation
D_\Phi(p, q) = \Phi(p) - \Phi(q) - \langle \nabla\Phi(q), p - q \ranglePhase 10: Mathematical foundations & information geometryConcept 70 of 100
Why It Matters for Modern Models
- Explains why different geometries suit different problems—simplex needs KL, not Euclidean
- Exponentiated gradient (softmax updates) is mirror descent with entropy potential
- Natural gradient is Bregman geometry with Fisher information as the potential
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- Euclidean gradient descent is ONE choice; mirror descent is the general framework
- The "mirror map" transforms to dual coordinates where steps are linear
- KL divergence is the Bregman divergence for probability distributions
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
Bregman divergence generated by strictly convex :
Mirror descent:
Special cases:
- → Euclidean GD, = squared distance
- → KL divergence on simplex (exponentiated gradient)