Legacy Concept Lab
Pruning: Removing Unnecessary Weights
Lottery ticket hypothesis changed how we think about overparameterization
#64PruningEfficiency
key equation
W_{pruned} = W \odot M, \quad M_{ij} = \mathbf{1}[|W_{ij}| > \theta]Phase 6: Modern efficiency & inferenceConcept 64 of 100
Why It Matters for Modern Models
- Lottery ticket hypothesis changed how we think about overparameterization
- SparseGPT can prune 50% of GPT-175B weights with minimal quality loss
- Structured pruning enables actual speedups; unstructured sparsity needs special hardware
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- Small weights ≠ unimportant; magnitude pruning is a heuristic, not optimal
- Unstructured 90% sparsity sounds great but doesn't speed up standard GPUs
- Iterative pruning (prune, retrain, repeat) works much better than one-shot
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
Magnitude pruning: Remove weights with smallest :
Structured pruning: Remove entire neurons/attention heads:
Lottery Ticket: There exist sparse subnetworks that train as well as dense:
OBS/OBD criterion (second-order): Prune weight that minimizes loss increase: