Legacy Concept Lab

Instruction Tuning

Unlocks instruction-following—base models don't understand "summarize"

Concept 84 of 100Scaling & AlignmentPhase 7

#84InstructScaling & Alignment

key equation\mathcal{L} = -\sum_t \log p(y_t | \text{instr}, y_{<t})

Phase 7: Alignment & RLHFConcept 84 of 100

Why It Matters for Modern Models

What is still poorly explained in textbooks and papers:

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation

\mathcal{L} = -\sum_t \log p(y_t | \text{instr}, y_{<t})

Fine-tune on (instruction, response) pairs:

\mathcal{L} = -\sum_{t} \log p(y_t | \text{instruction}, y_{<t})

Multi-task format:

\text{Task: } \langle\text{desc}\rangle \quad \text{Input: } x \quad \text{Output: } y

Instruction-tuned models generalize to new tasks (zero-shot).

Wei et al.2022ICLR

Explore this concept from different angles — like a mathematician would.