Legacy Concept Lab

RLAIF: AI Feedback

Removes human labeling bottleneck

Concept 82 of 100Scaling & AlignmentPhase 7

#82RLAIFScaling & Alignment

key equationr(y) = \text{LLM}(y | \text{criteria})

Phase 7: Alignment & RLHFConcept 82 of 100

Why It Matters for Modern Models

What is still poorly explained in textbooks and papers:

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation

r(y) = \text{LLM}(y | \text{criteria})

Replace human with AI preferences:

RLHF: $r(y) = \text{Human}(y)$ → expensive
RLAIF: $r(y) = \text{LLM}(y | \text{criteria})$ → scalable

AI judge: $P(y_1 \succ y_2) = \text{LLM}(x, y_1, y_2, \text{rubric})$

Correlates ~0.85 with human for many tasks.

Lee et al.2023arXiv

Explore this concept from different angles — like a mathematician would.