Legacy Concept Lab
Calibration & Temperature Scaling
Modern deep networks are often overconfident—90% confidence doesn't mean 90% accuracy
#66CalibrationTheory
key equation
P(Y = \hat{Y} | \hat{P} = p) = pPhase 8: Scaling, theory & multimodalConcept 66 of 100
Why It Matters for Modern Models
- Modern deep networks are often overconfident—90% confidence doesn't mean 90% accuracy
- Critical for downstream decisions: medical diagnosis, autonomous driving need honest uncertainty
- LLM "hallucination confidence" is a calibration failure—model is certain about wrong things
What Tutorials Skip
What is still poorly explained in textbooks and papers:
- Neural nets maximize log-likelihood, not calibration—these are different objectives
- Temperature scaling is surprisingly effective: one scalar fixes most miscalibration
- Bigger models are often LESS calibrated—scale doesn't solve everything
Interactive Visualization
Core Math (Optional Deep Dive)
If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.
Key Equation
A model is calibrated if confidence matches accuracy:
Expected Calibration Error (ECE):
Temperature scaling: Learn a single scalar on validation set:
softens predictions (reduces overconfidence).