Legacy Concept Lab
Sandwiching Evaluations
Makes scalable oversight empirically testable today
#94SandwichScaling & Alignment
key equation
\text{Score} = \frac{P_{H+A} - P_H}{P_E - P_H}Phase 12: Advanced alignment & safety researchConcept 94 of 100
Why It Matters for Modern Models
- Makes scalable oversight empirically testable today
- Choose tasks where experts can judge, non-experts struggle
- Proxy for future "smart model oversight" capabilities
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- Bottom = unaided human, top = expert, middle = AI-assisted human
- Tests: can weaker oversight + AI match stronger oversight?
- Foundational benchmark for alignment research progress
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
Sandwich score measures AI-assisted oversight:
- : non-expert performance
- : non-expert + AI assistance
- : expert performance
Score = 1.0 means assisted non-expert matches expert.