Legacy Concept Lab

Weak-to-Strong Generalization

Directly studies "how do we supervise something smarter than us?"

Concept 88 of 100Scaling & AlignmentPhase 12

#88Weak→StrongScaling & Alignment

key equation\mathbb{E}[\ell(S(x), y)] - \mathbb{E}[\ell(S(x), \tilde{y})]

Phase 12: Advanced alignment & safety researchConcept 88 of 100

Why It Matters for Modern Models

What is still poorly explained in textbooks and papers:

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation

\mathbb{E}[\ell(S(x), y)] - \mathbb{E}[\ell(S(x), \tilde{y})]

Strong model $S$ trained on weak labels $\tilde{y} = W(x)$ :

\tilde{y} \sim p(\tilde{y}|y), \quad y \sim p(y|x)

Training minimizes $\mathbb{E}[\ell(S(x), \tilde{y})]$ but we care about $\mathbb{E}[\ell(S(x), y)]$ .

Gap = true performance - weak label performance.

Burns et al.2023OpenAI

Explore this concept from different angles — like a mathematician would.