Five Pillars
Five interconnected areas of deep learning mathematics. Each one explores theory through visualization and interaction.
Sequence Modeling
The architecture of memory and attention
y = softmax(QKᵀ/√d)VFrom RNNs through Transformers to Mamba. Understand how models learn to process sequential information and the mathematical innovations that enable modern language models.
Optimization
Navigating the loss landscape
θ ← θ − η∇L(θ)Gradient descent as physics. Why Adam works, how Muon orthogonalizes updates, and the thermodynamic view of learning as escaping saddle points.
Generative Physics
Diffusion, flow, and the geometry of data
dx = f(x,t)dt + g(t)dWScore matching, flow matching, and rectified flows. See how generation is gradient descent in data space, with interactive phase portraits.
Geometric Deep Learning
Symmetry as inductive bias
f(ρ(g)x) = ρ′(g)f(x)When the structure of data implies the structure of networks. Equivariance, group theory, and why CNNs are just the beginning.
Mechanistic Interpretability
Reverse-engineering neural computation
x ≈ Σᵢ aᵢfᵢ (sparse)Superposition, sparse autoencoders, and circuit analysis. Interactive probes into what networks actually compute.
These areas share mathematical connections.
See how they link in the knowledge graph.