Legacy Concept Lab

Reproducing Kernel Hilbert Spaces

Unifies SVMs, Gaussian processes, kernel regression, and NTK under one framework

Concept 71 of 100TheoryPhase 10

#71RKHSTheory

key equationf(x) = \langle f, k(x, \cdot) \rangle_{\mathcal{H}}

Phase 10: Mathematical foundations & information geometryConcept 71 of 100

Why It Matters for Modern Models

Unifies SVMs, Gaussian processes, kernel regression, and NTK under one framework
Explains why "similarity functions" must be positive definite—they define inner products
NTK shows neural networks are kernel machines in the infinite-width limit

What is still poorly explained in textbooks and papers:

Kernels implicitly define an (often infinite-dimensional) feature space
Positive definiteness = you can build a Hilbert space where the kernel is an inner product
Attention can be viewed as a learned, data-dependent kernel

If you want intuition first, start with the key equation and the visualization. Come back here for the full walkthrough.

Key Equation

f(x) = \langle f, k(x, \cdot) \rangle_{\mathcal{H}}

Reproducing property: evaluation is an inner product:

f(x) = \langle f, k(x, \cdot) \rangle_{\mathcal{H}}

Kernel trick: inner product in feature space without explicit computation:

k(x, x') = \langle \varphi(x), \varphi(x') \rangle

Mercer decomposition (spectral):

k(x, x') = \sum_{m=1}^\infty \lambda_m e_m(x) e_m(x')

Representer theorem: optimal function is a linear combination of kernel evaluations:

f^*(x) = \sum_{i=1}^n \alpha_i k(x_i, x)

Shawe-Taylor & Cristianini2004Cambridge University Press

Jacot et al.2018NeurIPS

Explore this concept from different angles — like a mathematician would.