Five Pillars

Five interconnected areas of deep learning mathematics. Each one explores theory through visualization and interaction.

Sequence Modeling

The architecture of memory and attention

y = softmax(QKᵀ/√d)V

From RNNs through Transformers to Mamba. Understand how models learn to process sequential information and the mathematical innovations that enable modern language models.

AttentionSSMsMambaMemory

∇

Optimization

Navigating the loss landscape

θ ← θ − η∇L(θ)

Gradient descent as physics. Why Adam works, how Muon orthogonalizes updates, and the thermodynamic view of learning as escaping saddle points.

SGDAdamMuonSharpness

∂

Generative Physics

Diffusion, flow, and the geometry of data

dx = f(x,t)dt + g(t)dW

Score matching, flow matching, and rectified flows. See how generation is gradient descent in data space, with interactive phase portraits.

DiffusionFlow MatchingScoreSDEs

◇

Geometric Deep Learning

Symmetry as inductive bias

f(ρ(g)x) = ρ′(g)f(x)

When the structure of data implies the structure of networks. Equivariance, group theory, and why CNNs are just the beginning.

EquivarianceGNNsSymmetryLie Groups

⊕

Mechanistic Interpretability

Reverse-engineering neural computation

x ≈ Σᵢ aᵢfᵢ (sparse)

Superposition, sparse autoencoders, and circuit analysis. Interactive probes into what networks actually compute.

SAEsCircuitsFeaturesProbing

These areas share mathematical connections.
See how they link in the knowledge graph.