Probing, Linear Classifier Probes & Activation Analysis
Canonical Papers
Understanding Intermediate Layers using Linear Classifier Probes
Read paper →BERT Rediscovers the Classical NLP Pipeline
Read paper →Core Mathematics
Given layer representation , train a frozen probe:
(or a softmax over ) on a supervised task (POS tags, parse trees, etc.). The accuracy estimates how linearly separable that information is at layer .
BERT layers roughly follow the classical NLP pipeline (POS → syntax → semantics → coreference).
Key Equation
Interactive Visualization
Why It Matters for Modern Models
- Probing is one of the main tools to understand what GPT-like models know and where that knowledge lives
- Used heavily for safety (probing for dangerous capabilities), robustness, and fairness analyses
Missing Intuition
What is still poorly explained in textbooks and papers:
- Clear mental model of what probes measure (information content vs ease of extraction)
- Visual, layer-by-layer maps of information flow in large LMs