Legacy Concept Lab

Model-Graded Evaluations

Enables scalable safety testing without human bottleneck

Concept 92 of 100Scaling & AlignmentPhase 12
#92LLM-as-JudgeScaling & Alignment
key equation\hat{\mu} = \frac{1}{n} \sum_i E(x_i, y_i; R)
Phase 12: Advanced alignment & safety researchConcept 92 of 100

Why It Matters for Modern Models

  • Enables scalable safety testing without human bottleneck
  • Fast iteration loops for alignment research
  • Powers modern benchmarks: Chatbot Arena, AlpacaEval

What Tutorials Skip

What is still poorly explained in textbooks and papers:

  • Rubric defines what "good" means: helpfulness, truthfulness, safety
  • Calibration: does model-graded score match human judgment?
  • Position bias: models prefer first option—randomize order

Interactive Visualization

Core Math (Optional Deep Dive)

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation
μ^=1niE(xi,yi;R)\hat{\mu} = \frac{1}{n} \sum_i E(x_i, y_i; R)

Evaluator model EE scores outputs against rubric RR:

s=E(x,y;R)s = E(x, y; R)

Aggregate:

μ^=1ni=1nsi\hat{\mu} = \frac{1}{n} \sum_{i=1}^n s_i

Validate by correlating sis_i with human ratings. Track regressions across model versions.

Canonical Papers

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Zheng et al.2023NeurIPS
Read paper →

Connections

Next Moves

Explore this concept from different angles — like a mathematician would.