Legacy Concept Lab

Weak-to-Strong Generalization

Directly studies "how do we supervise something smarter than us?"

Concept 88 of 100Scaling & AlignmentPhase 12
#88Weak→StrongScaling & Alignment
key equation\mathbb{E}[\ell(S(x), y)] - \mathbb{E}[\ell(S(x), \tilde{y})]
Phase 12: Advanced alignment & safety researchConcept 88 of 100

Why It Matters for Modern Models

  • Directly studies "how do we supervise something smarter than us?"
  • Turns alignment into measurable ML generalization problem
  • Strong models recover capability beyond what weak labels provide

What Tutorials Skip

What is still poorly explained in textbooks and papers:

  • Like student learning from flawed teacher but getting it right
  • Model internalizes patterns, generalizes beyond noisy labels
  • Confidence-based losses help filter weak label errors

Interactive Visualization

Core Math (Optional Deep Dive)

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation
E[(S(x),y)]E[(S(x),y~)]\mathbb{E}[\ell(S(x), y)] - \mathbb{E}[\ell(S(x), \tilde{y})]

Strong model SS trained on weak labels y~=W(x)\tilde{y} = W(x):

y~p(y~y),yp(yx)\tilde{y} \sim p(\tilde{y}|y), \quad y \sim p(y|x)

Training minimizes E[(S(x),y~)]\mathbb{E}[\ell(S(x), \tilde{y})] but we care about E[(S(x),y)]\mathbb{E}[\ell(S(x), y)].

Gap = true performance - weak label performance.

Canonical Papers

Weak-to-Strong Generalization: Eliciting Strong Capability with Weak Supervision

Burns et al.2023OpenAI
Read paper →

Connections

Next Moves

Explore this concept from different angles — like a mathematician would.