Neural Tangent Kernel & Infinite-Width Limits
Canonical Papers
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Read paper →Core Mathematics
Define network with parameters . The NTK is:
In the infinite-width limit, this kernel becomes deterministic and remains constant during training. Training becomes:
a linear ODE in function space, just like kernel regression.
Key Equation
Interactive Visualization
Why It Matters for Modern Models
- NTK provides a mathematically clean limit where we can predict learning dynamics and generalization
- Many mechanistic-interpretability arguments assume behavior "somewhere between" kernel-like and feature-learning regimes
Missing Intuition
What is still poorly explained in textbooks and papers:
- Most expositions are algebraic; missing is a geometric animation showing how trajectories in function space under NTK differ from genuine feature learning