Domain Neighborhood

Optimization

How we train models: gradients, learning rates, curvature, and the practical tricks that make deep nets converge.

10 concepts4 published4 demos

Recommended Route

This sequence is ordered for learning rather than inventory: lower difficulty, fewer prerequisites, and more central concepts come first.

  1. 01
    Gradient Descent

    Gradient descent turns local slope information into an iterative update rule for reducing a loss.

    12 mincodedemoafter Derivatives

    Check Derivatives first if the symbols feel slippery.

  2. 02
    Loss Landscapes, Sharpness & Flat Minima

    How 2D loss slices, Hessian curvature, SAM-style neighborhood loss, and a toy 2/eta stability line expose local sensitivity during optimization.

    16 mincodedemoafter Adam Optimizer

    Why this follows: both pages keep the optimization thread active.

  3. 03
    Adam Optimizer

    Adam is an adaptive optimizer that combines momentum (EMA of gradients) with per-parameter RMS normalization (EMA of squared gradients).

    16 mincodedemoafter Derivatives, Norms

    Why this follows: both pages keep the optimization thread active.

  4. 04
    Learning Rate Schedules: Warmup, Decay & Cycling

    Schedule shapes that change the scalar learning-rate scale over training, with sourced CLR/range-test and SGDR cosine-restart examples plus caveated warmup/decay teaching patterns.

    14 mincodedemoafter Adam Optimizer, Loss Landscapes, Sharpness & Flat Minima

    Why this follows: Learning Rate Schedules: Warmup, Decay & Cycling uses Adam Optimizer directly.

All Published Notebooks

Browse the territory.

In Progress

Notebooks still below the publish bar.

Label Smoothing & Soft TargetsBatch NormalizationSGD & Momentum: The Workhorses of OptimizationGradient Clipping & Explosion PreventionWeight Decay & AdamW: Decoupled RegularizationWeight Initialization: Xavier, He & muP