Domain Neighborhood

Scaling

How loss and capability change with parameters, data, and compute; how to allocate a training budget; and why some abilities appear suddenly at scale.

6 concepts2 published6 demos

Start with Overparameterization & Generalization (Double Descent)Search Atlas

Recommended Route

Start here, then follow the prerequisites forward.

This sequence is ordered for learning rather than inventory: lower difficulty, fewer prerequisites, and more central concepts come first.

01
Overparameterization & Generalization (Double Descent)
Test error can peak at the interpolation threshold then fall again as models get larger: why modern overparameterized nets still generalize.
16 mincodedemoafter Loss Landscapes, Sharpness & Flat Minima
Check Loss Landscapes, Sharpness & Flat Minima first if the symbols feel slippery.
02
Scaling Laws & Emergent Abilities
Empirical power laws that predict how loss and capability improve with parameters, data, and compute, and how to choose compute-optimal training runs.
18 mincodedemoafter Scaled Dot-Product Attention & Transformer Layers, Overparameterization & Generalization (Double Descent)
Why this follows: Scaling Laws & Emergent Abilities uses Overparameterization & Generalization (Double Descent) directly.

All Published Notebooks

Browse the territory.

Overparameterization & Generalization (Double Descent)

Test error can peak at the interpolation threshold then fall again as models get larger: why modern overparameterized nets still generalize.

Level 316 mindemo

Scaling Laws & Emergent Abilities

Empirical power laws that predict how loss and capability improve with parameters, data, and compute, and how to choose compute-optimal training runs.

Level 318 mindemo

In Progress

Notebooks still below the publish bar.

Neural Tangent Kernel (NTK) & Infinite-Width LimitsPretraining Data Mixtures: Designing the Token DistributionTree Search Reasoning: Allocating Inference Budget Across PrefixesTest-Time Compute: Spending Inference Budget on Search