Legacy Concept Lab

Infinite Context Architectures

Turns entire repos/books into "single prompt" territory

Concept 100 of 100EfficiencyPhase 13
#100InfCtxEfficiency
key equationM_{t+1} = \text{Update}(M_t, K_t, V_t)
Phase 13: Cutting-edge 2024-2025 researchConcept 100 of 100

Why It Matters for Modern Models

  • Turns entire repos/books into "single prompt" territory
  • Streaming: process unbounded sequences with fixed memory
  • 1M+ tokens: Gemini 1.5, LongRoPE, Ring Attention

What Tutorials Skip

What is still poorly explained in textbooks and papers:

  • Compressive: old context summarized into memory state
  • Ring: sequence chunks processed in ring topology across GPUs
  • Hybrid: combine attention with SSM-style recurrence

Interactive Visualization

Core Math (Optional Deep Dive)

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation
Mt+1=Update(Mt,Kt,Vt)M_{t+1} = \text{Update}(M_t, K_t, V_t)

Compressive memory for bounded cost:

Attn(Q,K,V)=softmax(QKd)VO(n2)\mathrm{Attn}(Q,K,V) = \mathrm{softmax}\left(\frac{QK^\top}{\sqrt{d}}\right)V \quad O(n^2)

Infini-attention: Maintain memory MM updated online, cost bounded w.r.t. nn.

Ring Attention: Distribute long sequences across devices via blockwise ring communication.

Canonical Papers

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Munkhdalai et al.2024Google
Read paper →

Connections

Next Moves

Explore this concept from different angles — like a mathematician would.