Legacy Concept Lab

Iterated Amplification

Concrete proposal for scalable oversight when AI exceeds human capability

Concept 87 of 100Scaling & AlignmentPhase 12

#87IDAScaling & Alignment

key equationA' = \arg\min_\pi \mathrm{KL}(\text{Amp}(H,A) \| \pi)

Phase 12: Advanced alignment & safety researchConcept 87 of 100

Why It Matters for Modern Models

What is still poorly explained in textbooks and papers:

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation

A' = \arg\min_\pi \mathrm{KL}(\text{Amp}(H,A) \| \pi)

Amplify human $H$ with assistants $A$ , then distill:

A' \approx \arg\min_\pi \mathbb{E}_x\left[\mathrm{KL}\big(\text{Amp}(H,A)(\cdot|x) \| \pi(\cdot|x)\big)\right]

Then iterate: $A \leftarrow A'$ .

Recursion: as $A$ improves, $\text{Amp}(H,A)$ becomes more capable.

Christiano et al.2018arXiv

Explore this concept from different angles — like a mathematician would.