Legacy Concept Lab

Label Smoothing & Soft Targets

Used in most vision models and LLMs—simple trick with consistent improvements

Concept 60 of 100OptimizationPhase 3
#60Label SmoothOptimization
key equationy_{smooth} = (1 - \alpha) y + \frac{\alpha}{K}
Phase 3: Optimization & generalizationConcept 60 of 100
Migrated:view the updated version in /domainsThis /foundations page is legacy during migration.

Why It Matters for Modern Models

  • Used in most vision models and LLMs—simple trick with consistent improvements
  • Prevents overconfidence, which improves calibration and sometimes generalization
  • Knowledge distillation uses the same idea: train on soft targets from a teacher model

What Tutorials Skip

What is still poorly explained in textbooks and papers:

  • Hard targets say "100% sure it's class 3"—but that's almost never true in real data
  • Label smoothing implicitly regularizes: model can't drive logits to ±∞
  • Connects to calibration: smoothed models give more honest uncertainty estimates

Interactive Visualization

Core Math (Optional Deep Dive)

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation
ysmooth=(1α)y+αKy_{smooth} = (1 - \alpha) y + \frac{\alpha}{K}

Instead of hard targets y=[0,0,1,0]y = [0, 0, 1, 0], use soft targets:

ysmooth=(1α)y+αKy_{smooth} = (1 - \alpha) y + \frac{\alpha}{K}

For α=0.1\alpha = 0.1 and K=4K = 4 classes: [0.025,0.025,0.925,0.025][0.025, 0.025, 0.925, 0.025]

Effect on cross-entropy:

Lsmooth=(1α)LCE(p,y)+αLCE(p,u)\mathcal{L}_{smooth} = (1-\alpha) \mathcal{L}_{CE}(p, y) + \alpha \mathcal{L}_{CE}(p, u)

where uu is uniform. This penalizes overconfidence: logits can't go to infinity.

Canonical Papers

Rethinking the Inception Architecture for Computer Vision

Szegedy et al.2016CVPR
Read paper →

When Does Label Smoothing Help?

Müller, Kornblith, Hinton2019NeurIPS
Read paper →

Connections

Prerequisites

Next Moves

Explore this concept from different angles — like a mathematician would.