Legacy Concept Lab

Model-Graded Evaluations

Enables scalable safety testing without human bottleneck

Concept 92 of 100Scaling & AlignmentPhase 12

#92LLM-as-JudgeScaling & Alignment

key equation\hat{\mu} = \frac{1}{n} \sum_i E(x_i, y_i; R)

Phase 12: Advanced alignment & safety researchConcept 92 of 100

Why It Matters for Modern Models

What is still poorly explained in textbooks and papers:

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation

\hat{\mu} = \frac{1}{n} \sum_i E(x_i, y_i; R)

Evaluator model $E$ scores outputs against rubric $R$ :

s = E(x, y; R)

Aggregate:

\hat{\mu} = \frac{1}{n} \sum_{i=1}^n s_i

Validate by correlating $s_i$ with human ratings. Track regressions across model versions.

Zheng et al.2023NeurIPS

Explore this concept from different angles — like a mathematician would.