Skip to main content
∇Continuous Function
FoundationsPillarsOptimizersGraphVision

Five Pillars

Five interconnected areas of deep learning mathematics. Each one explores theory through visualization and interaction.

∿

Sequence Modeling

The architecture of memory and attention

y = softmax(QKᵀ/√d)V

From RNNs through Transformers to Mamba. Understand how models learn to process sequential information and the mathematical innovations that enable modern language models.

AttentionSSMsMambaMemory
∇

Optimization

Navigating the loss landscape

θ ← θ − η∇L(θ)

Gradient descent as physics. Why Adam works, how Muon orthogonalizes updates, and the thermodynamic view of learning as escaping saddle points.

SGDAdamMuonSharpness
∂

Generative Physics

Diffusion, flow, and the geometry of data

dx = f(x,t)dt + g(t)dW

Score matching, flow matching, and rectified flows. See how generation is gradient descent in data space, with interactive phase portraits.

DiffusionFlow MatchingScoreSDEs
◇

Geometric Deep Learning

Symmetry as inductive bias

f(ρ(g)x) = ρ′(g)f(x)

When the structure of data implies the structure of networks. Equivariance, group theory, and why CNNs are just the beginning.

EquivarianceGNNsSymmetryLie Groups
⊕

Mechanistic Interpretability

Reverse-engineering neural computation

x ≈ Σᵢ aᵢfᵢ (sparse)

Superposition, sparse autoencoders, and circuit analysis. Interactive probes into what networks actually compute.

SAEsCircuitsFeaturesProbing

These areas share mathematical connections.
See how they link in the knowledge graph.

Explorable explanations of optimization in deep learning.

Contact