Legacy Concept Lab

Retrieval-Augmented Generation (RAG)

RAG is the dominant paradigm for grounding LLMs in external/updated knowledge

Concept 43 of 100RepresentationsPhase 9

#43RAGRepresentations

key equationp(y|x) = \sum_{d} p(d|x) \cdot p(y|x, d)

Phase 9: Advanced architectures & generationConcept 43 of 100

Why It Matters for Modern Models

RAG is the dominant paradigm for grounding LLMs in external/updated knowledge
Explains why vector databases and embedding search became critical infrastructure
Separates "what the model knows" from "what the model can access"—enables knowledge updates without retraining

What is still poorly explained in textbooks and papers:

RAG trades model capacity for external memory: smaller models + good retrieval can match larger models
Retrieval quality is bottleneck: irrelevant docs hurt more than no docs (noise injection)
The "lost in the middle" problem: LLMs struggle to use information from middle of long contexts—retrieval ranking matters

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation

p(y|x) = \sum_{d} p(d|x) \cdot p(y|x, d)

RAG augments generation with retrieved documents:

p(y|x) = \sum_{d \in \text{top-}k} p(d|x) \cdot p(y|x, d)

Retrieval uses embedding similarity:

p(d|x) \propto \exp(\text{sim}(E_q(x), E_d(d)) / \tau)

where $E_q$ , $E_d$ are query/document encoders (often shared, e.g., BERT, Contriever).

Generation conditions on retrieved context:

p(y|x, d_1, \ldots, d_k) = \prod_t p(y_t | y_{<t}, x, d_1, \ldots, d_k)

Lewis et al.2020NeurIPS

Explore this concept from different angles — like a mathematician would.