Legacy Concept Lab

Adversarial Examples & Robustness

Reveals that neural networks are "right for the wrong reasons"—decision boundaries are brittle

Concept 44 of 100TheoryPhase 8
#44AdversarialTheory
key equationx_{\text{adv}} = x + \epsilon \cdot \text{sign}(\nabla_x L)
Phase 8: Scaling, theory & multimodalConcept 44 of 100

Why It Matters for Modern Models

  • Reveals that neural networks are "right for the wrong reasons"—decision boundaries are brittle
  • Foundation for understanding model robustness, jailbreaks, and AI safety
  • Adversarial training remains the most reliable defense—robust models generalize better to distribution shift

What Tutorials Skip

What is still poorly explained in textbooks and papers:

  • Adversarial examples exist because of high dimensionality: many directions to push decision boundaries
  • Linear hypothesis: even linear models are vulnerable due to high-dimensional dot products
  • Robustness-accuracy tradeoff: adversarial training typically hurts clean accuracy by 2-10%

Interactive Visualization

Core Math (Optional Deep Dive)

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation
xadv=x+ϵsign(xL)x_{\text{adv}} = x + \epsilon \cdot \text{sign}(\nabla_x L)

FGSM (Fast Gradient Sign Method) generates adversarial examples:

xadv=x+ϵsign(xL(θ,x,y))x_{\text{adv}} = x + \epsilon \cdot \text{sign}(\nabla_x L(\theta, x, y))

PGD (Projected Gradient Descent) iterates:

x(t+1)=ΠBϵ(x)(x(t)+αsign(xL))x^{(t+1)} = \Pi_{\mathcal{B}_\epsilon(x)} \left( x^{(t)} + \alpha \cdot \text{sign}(\nabla_x L) \right)

Adversarial training min-max objective:

minθE(x,y)[maxδϵL(θ,x+δ,y)]\min_\theta \mathbb{E}_{(x,y)} \left[ \max_{\|\delta\| \leq \epsilon} L(\theta, x + \delta, y) \right]

Small perturbations δ\delta cause large changes in model predictions.

Canonical Papers

Explaining and Harnessing Adversarial Examples

Goodfellow, Shlens, Szegedy2015ICLR
Read paper →

Connections

Next Moves

Explore this concept from different angles — like a mathematician would.