Legacy Concept Lab

Iterated Amplification

Concrete proposal for scalable oversight when AI exceeds human capability

Concept 87 of 100Scaling & AlignmentPhase 12
#87IDAScaling & Alignment
key equationA' = \arg\min_\pi \mathrm{KL}(\text{Amp}(H,A) \| \pi)
Phase 12: Advanced alignment & safety researchConcept 87 of 100

Why It Matters for Modern Models

  • Concrete proposal for scalable oversight when AI exceeds human capability
  • Human decomposes task, assistants solve subtasks, distill back
  • Foundational to modern AI safety research

What Tutorials Skip

What is still poorly explained in textbooks and papers:

  • Like teaching: break hard problems into pieces students can help with
  • Distillation compresses the amplified procedure into single model
  • Each iteration enables supervision of harder tasks

Interactive Visualization

Core Math (Optional Deep Dive)

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation
A=argminπKL(Amp(H,A)π)A' = \arg\min_\pi \mathrm{KL}(\text{Amp}(H,A) \| \pi)

Amplify human HH with assistants AA, then distill:

AargminπEx[KL(Amp(H,A)(x)π(x))]A' \approx \arg\min_\pi \mathbb{E}_x\left[\mathrm{KL}\big(\text{Amp}(H,A)(\cdot|x) \| \pi(\cdot|x)\big)\right]

Then iterate: AAA \leftarrow A'.

Recursion: as AA improves, Amp(H,A)\text{Amp}(H,A) becomes more capable.

Canonical Papers

Supervising strong learners by amplifying weak experts

Christiano et al.2018arXiv
Read paper →

Connections

Next Moves

Explore this concept from different angles — like a mathematician would.