Legacy Concept Lab
Instruction Tuning
Unlocks instruction-following—base models don't understand "summarize"
#84InstructScaling & Alignment
key equation
\mathcal{L} = -\sum_t \log p(y_t | \text{instr}, y_{<t})Phase 7: Alignment & RLHFConcept 84 of 100
Why It Matters for Modern Models
- Unlocks instruction-following—base models don't understand "summarize"
- FLAN showed dramatic zero-shot improvements
- First step before RLHF: instruct → RM → PPO
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- Base models predict text; instruct models follow commands
- Diversity matters: more task types = better generalization
- CoT in the mix teaches reasoning as an instruction
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
Fine-tune on (instruction, response) pairs:
Multi-task format:
Instruction-tuned models generalize to new tasks (zero-shot).