Legacy Concept Lab

Reproducing Kernel Hilbert Spaces

Unifies SVMs, Gaussian processes, kernel regression, and NTK under one framework

Concept 71 of 100TheoryPhase 10
#71RKHSTheory
key equationf(x) = \langle f, k(x, \cdot) \rangle_{\mathcal{H}}
Phase 10: Mathematical foundations & information geometryConcept 71 of 100

Why It Matters for Modern Models

  • Unifies SVMs, Gaussian processes, kernel regression, and NTK under one framework
  • Explains why "similarity functions" must be positive definite—they define inner products
  • NTK shows neural networks are kernel machines in the infinite-width limit

What Tutorials Skip

What is still poorly explained in textbooks and papers:

  • Kernels implicitly define an (often infinite-dimensional) feature space
  • Positive definiteness = you can build a Hilbert space where the kernel is an inner product
  • Attention can be viewed as a learned, data-dependent kernel

Interactive Visualization

Core Math (Optional Deep Dive)

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation
f(x)=f,k(x,)Hf(x) = \langle f, k(x, \cdot) \rangle_{\mathcal{H}}

Reproducing property: evaluation is an inner product:

f(x)=f,k(x,)Hf(x) = \langle f, k(x, \cdot) \rangle_{\mathcal{H}}

Kernel trick: inner product in feature space without explicit computation:

k(x,x)=φ(x),φ(x)k(x, x') = \langle \varphi(x), \varphi(x') \rangle

Mercer decomposition (spectral):

k(x,x)=m=1λmem(x)em(x)k(x, x') = \sum_{m=1}^\infty \lambda_m e_m(x) e_m(x')

Representer theorem: optimal function is a linear combination of kernel evaluations:

f(x)=i=1nαik(xi,x)f^*(x) = \sum_{i=1}^n \alpha_i k(x_i, x)

Canonical Papers

Kernel Methods for Pattern Analysis

Shawe-Taylor & Cristianini2004Cambridge University Press
Read paper →

Neural Tangent Kernel

Jacot et al.2018NeurIPS
Read paper →

Connections

Prerequisites

Next Moves

Explore this concept from different angles — like a mathematician would.