Domain Neighborhood
Scaling
How loss and capability change with parameters, data, and compute; how to allocate a training budget; and why some abilities appear suddenly at scale.
Recommended Route
Start here, then follow the prerequisites forward.
This sequence is ordered for learning rather than inventory: lower difficulty, fewer prerequisites, and more central concepts come first.
- 01Overparameterization & Generalization (Double Descent)
Test error can peak at the interpolation threshold then fall again as models get larger: why modern overparameterized nets still generalize.
16 mincodedemoafter Loss Landscapes, Sharpness & Flat MinimaCheck Loss Landscapes, Sharpness & Flat Minima first if the symbols feel slippery.
- 02Scaling Laws & Emergent Abilities
Empirical power laws that predict how loss and capability improve with parameters, data, and compute, and how to choose compute-optimal training runs.
18 mincodedemoafter Scaled Dot-Product Attention & Transformer Layers, Overparameterization & Generalization (Double Descent)Why this follows: Scaling Laws & Emergent Abilities uses Overparameterization & Generalization (Double Descent) directly.
All Published Notebooks
Browse the territory.
Overparameterization & Generalization (Double Descent)
Test error can peak at the interpolation threshold then fall again as models get larger: why modern overparameterized nets still generalize.
Scaling Laws & Emergent Abilities
Empirical power laws that predict how loss and capability improve with parameters, data, and compute, and how to choose compute-optimal training runs.
In Progress