Legacy Concept Lab
Fisher Information & Information Geometry
Fisher gives the natural metric on probability distributions—not Euclidean distance in parameters
#55Fisher InfoTheory
key equation
F_{ij}(\theta) = \mathbb{E}\left[\partial_i \log p_\theta \cdot \partial_j \log p_\theta\right]Phase 10: Mathematical foundations & information geometryConcept 55 of 100
Why It Matters for Modern Models
- Fisher gives the natural metric on probability distributions—not Euclidean distance in parameters
- Explains why KL penalties in RLHF/PPO are geometric constraints, not arbitrary regularization
- Connects curvature to uncertainty: high Fisher = parameters are well-identified
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- Distance in parameter space should mean "distinguishability of distributions"—Fisher captures this
- The Cramér-Rao bound: variance of any estimator ≥ 1/Fisher—more info = tighter estimates
- Fisher is the Hessian of KL at θ=θ₀, making it a second-order object without needing the loss Hessian
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
The Fisher Information Matrix measures how distinguishable distributions are:
Equivalently (under regularity):
KL as local metric: For small parameter changes: