Domain Neighborhood

Scaling

How loss and capability change with parameters, data, and compute; how to allocate a training budget; and why some abilities appear suddenly at scale.

6 concepts2 published6 demos

Recommended Route

This sequence is ordered for learning rather than inventory: lower difficulty, fewer prerequisites, and more central concepts come first.

  1. 01
    Overparameterization & Generalization (Double Descent)

    Test error can peak at the interpolation threshold then fall again as models get larger: why modern overparameterized nets still generalize.

    16 mincodedemoafter Loss Landscapes, Sharpness & Flat Minima

    Check Loss Landscapes, Sharpness & Flat Minima first if the symbols feel slippery.

  2. 02
    Scaling Laws & Emergent Abilities

    Empirical power laws that predict how loss and capability improve with parameters, data, and compute, and how to choose compute-optimal training runs.

    18 mincodedemoafter Scaled Dot-Product Attention & Transformer Layers, Overparameterization & Generalization (Double Descent)

    Why this follows: Scaling Laws & Emergent Abilities uses Overparameterization & Generalization (Double Descent) directly.

All Published Notebooks

Browse the territory.

In Progress

Notebooks still below the publish bar.

Neural Tangent Kernel (NTK) & Infinite-Width LimitsPretraining Data Mixtures: Designing the Token DistributionTree Search Reasoning: Allocating Inference Budget Across PrefixesTest-Time Compute: Spending Inference Budget on Search