Legacy Concept Lab
Adversarial Examples & Robustness
Reveals that neural networks are "right for the wrong reasons"—decision boundaries are brittle
#44AdversarialTheory
key equation
x_{\text{adv}} = x + \epsilon \cdot \text{sign}(\nabla_x L)Phase 8: Scaling, theory & multimodalConcept 44 of 100
Why It Matters for Modern Models
- Reveals that neural networks are "right for the wrong reasons"—decision boundaries are brittle
- Foundation for understanding model robustness, jailbreaks, and AI safety
- Adversarial training remains the most reliable defense—robust models generalize better to distribution shift
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- Adversarial examples exist because of high dimensionality: many directions to push decision boundaries
- Linear hypothesis: even linear models are vulnerable due to high-dimensional dot products
- Robustness-accuracy tradeoff: adversarial training typically hurts clean accuracy by 2-10%
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
FGSM (Fast Gradient Sign Method) generates adversarial examples:
PGD (Projected Gradient Descent) iterates:
Adversarial training min-max objective:
Small perturbations cause large changes in model predictions.