Legacy Concept Lab

Test-Time Compute & Inference Scaling

The paradigm behind o1: spend more compute at inference for harder problems

Concept 74 of 100Scaling & AlignmentPhase 11

#74Test-TimeScaling & Alignment

key equation\text{Quality} \sim \log(\text{inference compute})

Phase 11: Frontier research & scalingConcept 74 of 100

Why It Matters for Modern Models

What is still poorly explained in textbooks and papers:

Train-time and test-time compute are substitutes: you can trade one for the other
Verifiers (reward models) let you search through many candidate solutions
Tree search over reasoning steps explores the space of possible derivations

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation

\text{Quality} \sim \log(\text{inference compute})

Test-time scaling trades compute for quality at inference:

\text{Quality} \sim \log(\text{inference compute})

Best-of-N sampling: Generate N responses, select best via verifier:

y^* = \arg\max_{y \in \{y_1, ..., y_N\}} V(y)

Process Reward Models score intermediate steps:

R_{PRM}(s_1, ..., s_k) = \prod_{i=1}^k P(\text{correct} | s_1, ..., s_i)

Monte Carlo Tree Search for reasoning:

UCB(s) = V(s) + c\sqrt{\frac{\log N_{parent}}{N_s}}

Snell et al.2024arXiv

Lightman et al.2023arXiv

Explore this concept from different angles — like a mathematician would.