Legacy Concept Lab
Retrieval-Augmented Generation (RAG)
RAG is the dominant paradigm for grounding LLMs in external/updated knowledge
#43RAGRepresentations
key equation
p(y|x) = \sum_{d} p(d|x) \cdot p(y|x, d)Phase 9: Advanced architectures & generationConcept 43 of 100
Why It Matters for Modern Models
- RAG is the dominant paradigm for grounding LLMs in external/updated knowledge
- Explains why vector databases and embedding search became critical infrastructure
- Separates "what the model knows" from "what the model can access"—enables knowledge updates without retraining
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- RAG trades model capacity for external memory: smaller models + good retrieval can match larger models
- Retrieval quality is bottleneck: irrelevant docs hurt more than no docs (noise injection)
- The "lost in the middle" problem: LLMs struggle to use information from middle of long contexts—retrieval ranking matters
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
RAG augments generation with retrieved documents:
Retrieval uses embedding similarity:
where , are query/document encoders (often shared, e.g., BERT, Contriever).
Generation conditions on retrieved context: