Legacy Concept Lab

Sandwiching Evaluations

Makes scalable oversight empirically testable today

Concept 94 of 100Scaling & AlignmentPhase 12

#94SandwichScaling & Alignment

key equation\text{Score} = \frac{P_{H+A} - P_H}{P_E - P_H}

Phase 12: Advanced alignment & safety researchConcept 94 of 100

Why It Matters for Modern Models

What is still poorly explained in textbooks and papers:

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation

\text{Score} = \frac{P_{H+A} - P_H}{P_E - P_H}

Sandwich score measures AI-assisted oversight:

\text{SandwichScore} = \frac{P_{H+A} - P_H}{P_E - P_H}

Score = 1.0 means assisted non-expert matches expert.

Bowman et al.2022Anthropic

Explore this concept from different angles — like a mathematician would.