10Representations

Representation Learning & Embedding Geometry

Canonical Papers

Representation Learning: A Review and New Perspectives

Bengio et al.2013IEEE TPAMI
Read paper →

Core Mathematics

Learn a mapping fθ:XRdf_\theta: \mathcal X \to \mathbb R^d such that inner products or distances reflect meaningful relations.

Contrastive objective (InfoNCE-style):

L=E[logexp(sim(f(x),g(y))/τ)yexp(sim(f(x),g(y))/τ)]\mathcal L = - \mathbb E \left[ \log \frac{\exp(\mathrm{sim}(f(x),g(y))/\tau)}{\sum_{y'} \exp(\mathrm{sim}(f(x), g(y'))/\tau)} \right]

This pushes "positive" pairs together, "negatives" apart; at optimum, it maximizes a lower bound on mutual information between views.

Key Equation
L=E[logexp(sim(f(x),g(y))/τ)yexp(sim(f(x),g(y))/τ)]\mathcal L = - \mathbb E \left[ \log \frac{\exp(\mathrm{sim}(f(x),g(y))/\tau)}{\sum_{y'} \exp(\mathrm{sim}(f(x), g(y'))/\tau)} \right]

Interactive Visualization

Why It Matters for Modern Models

  • Word & token embeddings in LMs, vision embeddings in CLIP-like models, multimodal embeddings in Gemini and GPT-4V
  • Latent spaces of Stable Diffusion designed so distances correspond to semantic similarity

Missing Intuition

What is still poorly explained in textbooks and papers:

  • Geometric explanation of anisotropy (representations bunch along a few directions) and how normalization/whitening alter behavior
  • Visuals showing how representations evolve across layers (local to global features)

Connections

Next Moves

Explore this concept from different angles — like a mathematician would.