Domain Neighborhood
Optimization
How we train models: gradients, learning rates, curvature, and the practical tricks that make deep nets converge.
Recommended Route
Start here, then follow the prerequisites forward.
This sequence is ordered for learning rather than inventory: lower difficulty, fewer prerequisites, and more central concepts come first.
- 01Gradient Descent
Gradient descent turns local slope information into an iterative update rule for reducing a loss.
12 mincodedemoafter DerivativesCheck Derivatives first if the symbols feel slippery.
- 02Loss Landscapes, Sharpness & Flat Minima
How 2D loss slices, Hessian curvature, SAM-style neighborhood loss, and a toy 2/eta stability line expose local sensitivity during optimization.
16 mincodedemoafter Adam OptimizerWhy this follows: both pages keep the optimization thread active.
- 03Adam Optimizer
Adam is an adaptive optimizer that combines momentum (EMA of gradients) with per-parameter RMS normalization (EMA of squared gradients).
16 mincodedemoafter Derivatives, NormsWhy this follows: both pages keep the optimization thread active.
- 04Learning Rate Schedules: Warmup, Decay & Cycling
Schedule shapes that change the scalar learning-rate scale over training, with sourced CLR/range-test and SGDR cosine-restart examples plus caveated warmup/decay teaching patterns.
14 mincodedemoafter Adam Optimizer, Loss Landscapes, Sharpness & Flat MinimaWhy this follows: Learning Rate Schedules: Warmup, Decay & Cycling uses Adam Optimizer directly.
All Published Notebooks
Browse the territory.
Gradient Descent
Gradient descent turns local slope information into an iterative update rule for reducing a loss.
Loss Landscapes, Sharpness & Flat Minima
How 2D loss slices, Hessian curvature, SAM-style neighborhood loss, and a toy 2/eta stability line expose local sensitivity during optimization.
Adam Optimizer
Adam is an adaptive optimizer that combines momentum (EMA of gradients) with per-parameter RMS normalization (EMA of squared gradients).
Learning Rate Schedules: Warmup, Decay & Cycling
Schedule shapes that change the scalar learning-rate scale over training, with sourced CLR/range-test and SGDR cosine-restart examples plus caveated warmup/decay teaching patterns.
In Progress