Theoretical Foundations: PAC Learning, MDL & Information Bottleneck
Canonical Papers
A Theory of the Learnable
Read paper →Modeling by Shortest Data Description
Read paper →Deep Learning and the Information Bottleneck Principle
Read paper →Core Mathematics
PAC learning: concept class is PAC-learnable if algorithm uses examples to output hypothesis with error ≤ with probability ≥ , where is VC dimension.
MDL: choose hypothesis minimizing:
Information Bottleneck: learn representation of input that keeps information about target but compresses :
Key Equation
Interactive Visualization
Why It Matters for Modern Models
- PAC/VC theory gives language and bounds for generalization and sample complexity
- MDL/compression perspectives underlie "good models should be simple and predictive"
- Info-bottleneck motivates representation compression and what middle layers do
Missing Intuition
What is still poorly explained in textbooks and papers:
- How to connect formal sample-complexity bounds to actual overparameterized models that interpolate
- Intuitive, visual examples of MDL in deep nets (comparing code lengths of different architectures)