Legacy Concept Lab
Weak-to-Strong Generalization
Directly studies "how do we supervise something smarter than us?"
#88Weak→StrongScaling & Alignment
key equation
\mathbb{E}[\ell(S(x), y)] - \mathbb{E}[\ell(S(x), \tilde{y})]Phase 12: Advanced alignment & safety researchConcept 88 of 100
Why It Matters for Modern Models
- Directly studies "how do we supervise something smarter than us?"
- Turns alignment into measurable ML generalization problem
- Strong models recover capability beyond what weak labels provide
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- Like student learning from flawed teacher but getting it right
- Model internalizes patterns, generalizes beyond noisy labels
- Confidence-based losses help filter weak label errors
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
Strong model trained on weak labels :
Training minimizes but we care about .
Gap = true performance - weak label performance.