Legacy Concept Lab
Label Smoothing & Soft Targets
Used in most vision models and LLMs—simple trick with consistent improvements
#60Label SmoothOptimization
key equation
y_{smooth} = (1 - \alpha) y + \frac{\alpha}{K}Phase 3: Optimization & generalizationConcept 60 of 100
Why It Matters for Modern Models
- Used in most vision models and LLMs—simple trick with consistent improvements
- Prevents overconfidence, which improves calibration and sometimes generalization
- Knowledge distillation uses the same idea: train on soft targets from a teacher model
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- Hard targets say "100% sure it's class 3"—but that's almost never true in real data
- Label smoothing implicitly regularizes: model can't drive logits to ±∞
- Connects to calibration: smoothed models give more honest uncertainty estimates
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
Instead of hard targets , use soft targets:
For and classes:
Effect on cross-entropy:
where is uniform. This penalizes overconfidence: logits can't go to infinity.