Legacy Concept Lab
Tree Search over Thoughts
Makes inference like planning, not text completion
#96MCTS-LLMScaling & Alignment
key equation
a^* = \arg\max_a \left( Q(s,a) + c\sqrt{\frac{\ln N(s)}{N(s,a)}} \right)Phase 13: Cutting-edge 2024-2025 researchConcept 96 of 100
Why It Matters for Modern Models
- Makes inference like planning, not text completion
- Enables systematic exploration of reasoning paths
- Foundation for o1-style "System 2" thinking
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- Each node is a partial solution/thought
- Verifier provides value estimates for backpropagation
- Trade-off: exploration (new paths) vs exploitation (best paths)
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
MCTS over reasoning states. UCB action selection:
Expand with LM policy prior . Back up values from rollouts/verifier.
Key insight: Inference becomes planning, not just generation.