Legacy Concept Lab

Tree Search over Thoughts

Makes inference like planning, not text completion

Concept 96 of 100Scaling & AlignmentPhase 13

#96MCTS-LLMScaling & Alignment

key equationa^* = \arg\max_a \left( Q(s,a) + c\sqrt{\frac{\ln N(s)}{N(s,a)}} \right)

Phase 13: Cutting-edge 2024-2025 researchConcept 96 of 100

Why It Matters for Modern Models

What is still poorly explained in textbooks and papers:

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation

a^* = \arg\max_a \left( Q(s,a) + c\sqrt{\frac{\ln N(s)}{N(s,a)}} \right)

MCTS over reasoning states. UCB action selection:

a^* = \arg\max_a \left( Q(s,a) + c\sqrt{\frac{\ln N(s)}{N(s,a)}} \right)

Expand with LM policy prior $\pi_\theta(a|s)$ . Back up values $Q$ from rollouts/verifier.

Key insight: Inference becomes planning, not just generation.

Zhou et al.2024ICML

Explore this concept from different angles — like a mathematician would.