Legacy Concept Lab

Test-Time Compute & Inference Scaling

The paradigm behind o1: spend more compute at inference for harder problems

Concept 74 of 100Scaling & AlignmentPhase 11
#74Test-TimeScaling & Alignment
key equation\text{Quality} \sim \log(\text{inference compute})
Phase 11: Frontier research & scalingConcept 74 of 100
Migrated:view the updated version in /domainsThis /foundations page is legacy during migration.

Why It Matters for Modern Models

  • The paradigm behind o1: spend more compute at inference for harder problems
  • Enables adaptive compute: easy questions are fast, hard ones "think longer"
  • May be more efficient than pure pretraining scaling for reasoning tasks

What Tutorials Skip

What is still poorly explained in textbooks and papers:

  • Train-time and test-time compute are substitutes: you can trade one for the other
  • Verifiers (reward models) let you search through many candidate solutions
  • Tree search over reasoning steps explores the space of possible derivations

Interactive Visualization

Core Math (Optional Deep Dive)

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation
Qualitylog(inference compute)\text{Quality} \sim \log(\text{inference compute})

Test-time scaling trades compute for quality at inference:

Qualitylog(inference compute)\text{Quality} \sim \log(\text{inference compute})

Best-of-N sampling: Generate N responses, select best via verifier:

y=argmaxy{y1,...,yN}V(y)y^* = \arg\max_{y \in \{y_1, ..., y_N\}} V(y)

Process Reward Models score intermediate steps:

RPRM(s1,...,sk)=i=1kP(corrects1,...,si)R_{PRM}(s_1, ..., s_k) = \prod_{i=1}^k P(\text{correct} | s_1, ..., s_i)

Monte Carlo Tree Search for reasoning:

UCB(s)=V(s)+clogNparentNsUCB(s) = V(s) + c\sqrt{\frac{\log N_{parent}}{N_s}}

Canonical Papers

Scaling LLM Test-Time Compute Optimally

Snell et al.2024arXiv
Read paper →

Let's Verify Step by Step

Lightman et al.2023arXiv
Read paper →

Connections

Next Moves

Explore this concept from different angles — like a mathematician would.