Legacy Concept Lab

Calibration & Temperature Scaling

Modern deep networks are often overconfident—90% confidence doesn't mean 90% accuracy

Concept 66 of 100TheoryPhase 8
#66CalibrationTheory
key equationP(Y = \hat{Y} | \hat{P} = p) = p
Phase 8: Scaling, theory & multimodalConcept 66 of 100

Why It Matters for Modern Models

  • Modern deep networks are often overconfident—90% confidence doesn't mean 90% accuracy
  • Critical for downstream decisions: medical diagnosis, autonomous driving need honest uncertainty
  • LLM "hallucination confidence" is a calibration failure—model is certain about wrong things

What Tutorials Skip

What is still poorly explained in textbooks and papers:

  • Neural nets maximize log-likelihood, not calibration—these are different objectives
  • Temperature scaling is surprisingly effective: one scalar fixes most miscalibration
  • Bigger models are often LESS calibrated—scale doesn't solve everything

Interactive Visualization

Core Math (Optional Deep Dive)

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation
P(Y=Y^P^=p)=pP(Y = \hat{Y} | \hat{P} = p) = p

A model is calibrated if confidence matches accuracy:

P(Y=Y^P^=p)=pP(Y = \hat{Y} | \hat{P} = p) = p

Expected Calibration Error (ECE):

ECE=m=1MBmnacc(Bm)conf(Bm)ECE = \sum_{m=1}^M \frac{|B_m|}{n} |\text{acc}(B_m) - \text{conf}(B_m)|

Temperature scaling: Learn a single scalar TT on validation set:

qi=exp(zi/T)jexp(zj/T)q_i = \frac{\exp(z_i / T)}{\sum_j \exp(z_j / T)}

T>1T > 1 softens predictions (reduces overconfidence).

Canonical Papers

On Calibration of Modern Neural Networks

Guo et al.2017ICML
Read paper →

Verified Uncertainty Calibration

Kumar et al.2019NeurIPS
Read paper →

Connections

Next Moves

Explore this concept from different angles — like a mathematician would.