Legacy Concept Lab
Infinite Context Architectures
Turns entire repos/books into "single prompt" territory
#100InfCtxEfficiency
key equation
M_{t+1} = \text{Update}(M_t, K_t, V_t)Phase 13: Cutting-edge 2024-2025 researchConcept 100 of 100
Why It Matters for Modern Models
- Turns entire repos/books into "single prompt" territory
- Streaming: process unbounded sequences with fixed memory
- 1M+ tokens: Gemini 1.5, LongRoPE, Ring Attention
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- Compressive: old context summarized into memory state
- Ring: sequence chunks processed in ring topology across GPUs
- Hybrid: combine attention with SSM-style recurrence
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
Compressive memory for bounded cost:
Infini-attention: Maintain memory updated online, cost bounded w.r.t. .
Ring Attention: Distribute long sequences across devices via blockwise ring communication.