Legacy Concept Lab

Pruning: Removing Unnecessary Weights

Lottery ticket hypothesis changed how we think about overparameterization

Concept 64 of 100EfficiencyPhase 6
#64PruningEfficiency
key equationW_{pruned} = W \odot M, \quad M_{ij} = \mathbf{1}[|W_{ij}| > \theta]
Phase 6: Modern efficiency & inferenceConcept 64 of 100
Migrated:view the updated version in /domainsThis /foundations page is legacy during migration.

Why It Matters for Modern Models

  • Lottery ticket hypothesis changed how we think about overparameterization
  • SparseGPT can prune 50% of GPT-175B weights with minimal quality loss
  • Structured pruning enables actual speedups; unstructured sparsity needs special hardware

What Tutorials Skip

What is still poorly explained in textbooks and papers:

  • Small weights ≠ unimportant; magnitude pruning is a heuristic, not optimal
  • Unstructured 90% sparsity sounds great but doesn't speed up standard GPUs
  • Iterative pruning (prune, retrain, repeat) works much better than one-shot

Interactive Visualization

Core Math (Optional Deep Dive)

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation
Wpruned=WM,Mij=1[Wij>θ]W_{pruned} = W \odot M, \quad M_{ij} = \mathbf{1}[|W_{ij}| > \theta]

Magnitude pruning: Remove weights with smallest w|w|:

Wpruned=WM,Mij=1[Wij>θ]W_{pruned} = W \odot M, \quad M_{ij} = \mathbf{1}[|W_{ij}| > \theta]

Structured pruning: Remove entire neurons/attention heads:

Wpruned=W[:,keep_indices]W_{pruned} = W[:, \text{keep\_indices}]

Lottery Ticket: There exist sparse subnetworks that train as well as dense:

M:train(W0M)train(W0)\exists M: \text{train}(W_0 \odot M) \approx \text{train}(W_0)

OBS/OBD criterion (second-order): Prune weight that minimizes loss increase:

δLwi22Hii1\delta L \approx \frac{w_i^2}{2 H_{ii}^{-1}}

Canonical Papers

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Frankle & Carlin2019ICLR
Read paper →

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

Frantar & Alistarh2023ICML
Read paper →

Connections

Next Moves

Explore this concept from different angles — like a mathematician would.