Domain Neighborhood

Optimization

How we train models: gradients, learning rates, curvature, and the practical tricks that make deep nets converge.

10 concepts4 published4 demos

Start with Gradient Descent Search Atlas

Recommended Route

Start here, then follow the prerequisites forward.

This sequence is ordered for learning rather than inventory: lower difficulty, fewer prerequisites, and more central concepts come first.

01
Gradient Descent
Gradient descent turns local slope information into an iterative update rule for reducing a loss.
12 mincodedemoafter Derivatives
Check Derivatives first if the symbols feel slippery.
02
Loss Landscapes, Sharpness & Flat Minima
How 2D loss slices, Hessian curvature, SAM-style neighborhood loss, and a toy 2/eta stability line expose local sensitivity during optimization.
16 mincodedemoafter Adam Optimizer
Why this follows: both pages keep the optimization thread active.
03
Adam Optimizer
Adam is an adaptive optimizer that combines momentum (EMA of gradients) with per-parameter RMS normalization (EMA of squared gradients).
16 mincodedemoafter Derivatives, Norms
Why this follows: both pages keep the optimization thread active.
04
Learning Rate Schedules: Warmup, Decay & Cycling
Schedule shapes that change the scalar learning-rate scale over training, with sourced CLR/range-test and SGDR cosine-restart examples plus caveated warmup/decay teaching patterns.
14 mincodedemoafter Adam Optimizer, Loss Landscapes, Sharpness & Flat Minima
Why this follows: Learning Rate Schedules: Warmup, Decay & Cycling uses Adam Optimizer directly.

All Published Notebooks

Browse the territory.

Gradient Descent

Gradient descent turns local slope information into an iterative update rule for reducing a loss.

Level 212 mindemo

Loss Landscapes, Sharpness & Flat Minima

How 2D loss slices, Hessian curvature, SAM-style neighborhood loss, and a toy 2/eta stability line expose local sensitivity during optimization.

Level 316 mindemo

Adam Optimizer

Adam is an adaptive optimizer that combines momentum (EMA of gradients) with per-parameter RMS normalization (EMA of squared gradients).

Level 316 mindemo

Learning Rate Schedules: Warmup, Decay & Cycling

Schedule shapes that change the scalar learning-rate scale over training, with sourced CLR/range-test and SGDR cosine-restart examples plus caveated warmup/decay teaching patterns.

Level 314 mindemo

In Progress

Notebooks still below the publish bar.

Label Smoothing & Soft TargetsBatch NormalizationSGD & Momentum: The Workhorses of OptimizationGradient Clipping & Explosion PreventionWeight Decay & AdamW: Decoupled RegularizationWeight Initialization: Xavier, He & muP