Legacy Concept Lab

Tree Search over Thoughts

Makes inference like planning, not text completion

Concept 96 of 100Scaling & AlignmentPhase 13
#96MCTS-LLMScaling & Alignment
key equationa^* = \arg\max_a \left( Q(s,a) + c\sqrt{\frac{\ln N(s)}{N(s,a)}} \right)
Phase 13: Cutting-edge 2024-2025 researchConcept 96 of 100
Migrated:view the updated version in /domainsThis /foundations page is legacy during migration.

Why It Matters for Modern Models

  • Makes inference like planning, not text completion
  • Enables systematic exploration of reasoning paths
  • Foundation for o1-style "System 2" thinking

What Tutorials Skip

What is still poorly explained in textbooks and papers:

  • Each node is a partial solution/thought
  • Verifier provides value estimates for backpropagation
  • Trade-off: exploration (new paths) vs exploitation (best paths)

Interactive Visualization

Core Math (Optional Deep Dive)

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation
a=argmaxa(Q(s,a)+clnN(s)N(s,a))a^* = \arg\max_a \left( Q(s,a) + c\sqrt{\frac{\ln N(s)}{N(s,a)}} \right)

MCTS over reasoning states. UCB action selection:

a=argmaxa(Q(s,a)+clnN(s)N(s,a))a^* = \arg\max_a \left( Q(s,a) + c\sqrt{\frac{\ln N(s)}{N(s,a)}} \right)

Expand with LM policy prior πθ(as)\pi_\theta(a|s). Back up values QQ from rollouts/verifier.

Key insight: Inference becomes planning, not just generation.

Canonical Papers

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Zhou et al.2024ICML
Read paper →

Connections

Next Moves

Explore this concept from different angles — like a mathematician would.