Legacy Concept Lab
Natural Gradient & Riemannian Optimization
Natural gradient is coordinate-invariant—it gives the same update regardless of parameterization
#56Natural GradOptimization
key equation
\tilde{\nabla} L = F(\theta)^{-1} \nabla_\theta LPhase 10: Mathematical foundations & information geometryConcept 56 of 100
Why It Matters for Modern Models
- Natural gradient is coordinate-invariant—it gives the same update regardless of parameterization
- TRPO and PPO are approximations to natural gradient updates for policy optimization
- Adam can be viewed as a diagonal approximation to natural gradient with adaptive preconditioning
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- The gradient is a covector, not a vector—the metric turns it into a direction of steepest descent
- Euclidean gradient depends on how you parameterize; natural gradient depends only on the distributions
- Natural gradient avoids plateaus faster because it accounts for local curvature in distribution space
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
Natural gradient uses the Fisher metric instead of Euclidean:
Update rule:
Variational characterization (why it's "natural"):