Representation Learning

Autoencoders and Denoising Autoencoders

Autoencoders learn codes by reconstructing inputs; denoising autoencoders reconstruct clean targets from corrupted inputs, so the useful lesson is compression plus constraint, not magic understanding.

status: publishedimportance: criticaldifficulty 3/5math: undergraduateread: 18mlive demo

Back to Representation Learning Next: Variational Autoencoders

Concept Structure

Autoencoders and Denoising Autoencoders

01Intuition

Start with the picture, metaphor, or geometric mechanism.

02Math

Make the objects explicit and connect them with notation.

03Code

Mirror the equations with runnable implementation details.

04Interactive Demo

Manipulate the mechanism and watch the idea respond.

3prerequisites

2next concepts

1related links

Learning map

Autoencoders and Denoising Autoencoders

BeforePCA as Optimization and Eigenspace ProjectionNow4/4 sections readyTryManipulate one control and predict the visible change.NextVariational Autoencoders

Object flow

4/4 sections readyAsk about this Research room

ConceptAutoencoders and Denoising AutoencodersRepresentation Learning EquationAutoencoders and Denoising Autoencoders equation 1Exact equation object CodeAutoencoders and Denoising Autoencoders code witness 1Exact code witness DemoAutoencoders and Denoising Autoencoders interactive demoVisualization object ClaimA deterministic autoencoder maps an input through an encoder to a cod...Exact claim check SourceDeep Learning, Chapter 14: AutoencodersExact source object

ConceptAutoencoders and Denoising AutoencodersRepresentation Learning

4 sources attachedLocal snapshot ready

concept:representation-learning/autoencoders-denoising-autoencoders

Codewitness nearby Predictbefore reveal Roomobject handoff

Conceptual Bridge

What should feel connected as you move through this page.

Carry inPCA as Optimization and Eigenspace Projection

Bring the mental model from PCA as Optimization and Eigenspace Projection; this page will reuse it instead of restarting from zero.

Work hereAutoencoders and Denoising Autoencoders

Carry outVariational Autoencoders

The next edge should feel earned: use the demo prediction here before following Variational Autoencoders.

Test the linkManipulate one control and predict the visible change.Then continue to Variational Autoencoders

01IntuitionStart with the picture, metaphor, or geometric mechanism.02MathMake the objects explicit and connect them with notation.03CodeMirror the equations with runnable implementation details.04Interactive DemoManipulate the mechanism and watch the idea respond.

Intuition

Build the mental picture first so the rest of the page has something to attach to.

Section prompt

PCA gives a clean linear story: project a centered data cloud into a smaller subspace, then reconstruct the points as well as possible from that subspace.

An autoencoder asks for a learned version of that story.

Instead of choosing an orthonormal subspace by an eigendecomposition, it trains two maps:

an encoder that turns an input into a code;
a decoder that turns the code back into a reconstruction.

The training signal is deliberately plain: reconstruct the original input. That simplicity is both the charm and the danger.

If the model can copy every input perfectly, it may not learn a useful representation at all. It may just learn a fancy identity function. An autoencoder becomes interesting only when something prevents trivial copying: a narrow bottleneck, noise, sparsity, weight decay, architecture, or some other constraint that pressures it to keep the structure shared across examples and drop details it cannot afford to carry.

A denoising autoencoder adds one extra twist:

corrupt the input, but keep the clean example as the target.

The encoder sees a damaged version of the example. The decoder is trained to reconstruct the original. This does not make the model an oracle for missing information. It learns a plausible repair rule from the training distribution. If the corruption is too severe or the input is ambiguous, the model may reconstruct the wrong clean pattern or an average-looking compromise.

So the central idea is:

an autoencoder is useful when reconstruction is made hard enough that the code must capture reusable structure.

Math

Translate the story into symbols, assumptions, and a derivation you can inspect.

Section prompt

Equation 1x \in \mathbb R^d Equation 2f_\theta:\mathbb R^d \to \mathbb R^m

Let

x \in \mathbb R^d

be one input vector. A deterministic autoencoder has an encoder

f_\theta:\mathbb R^d \to \mathbb R^m

and a decoder

g_\phi:\mathbb R^m \to \mathbb R^d.

The latent code is

h = f_\theta(x) \in \mathbb R^m,

and the reconstruction is

\widehat x = g_\phi(h) = g_\phi(f_\theta(x)).

Given training examples $x_1,\ldots,x_n$ , the standard reconstruction objective is

\min_{\theta,\phi} \frac{1}{n}\sum_{i=1}^n \ell\!\left(x_i, g_\phi(f_\theta(x_i))\right).

For real-valued vectors, a common loss is squared error:

\ell(x,\widehat x)=\|x-\widehat x\|_2^2.

If the code dimension $m$ is smaller than the input dimension $d$ , the autoencoder is undercomplete. It cannot store every coordinate independently, so it must decide what information the code preserves.

This is the bridge back to PCA. In the special case where both encoder and decoder are linear, the loss is squared reconstruction error, and the code is undercomplete, the learned reconstruction subspace is the same principal subspace as PCA. That does not make nonlinear autoencoders globally solved PCA replacements, and it does not guarantee better downstream features.

If $m \ge d$ , or if the encoder and decoder are powerful enough, the model may copy inputs without learning useful structure. Overcomplete autoencoders therefore need some other pressure: noise, sparsity, weight decay, dropout, contractive penalties, architectural limits, or a task-specific objective.

A denoising autoencoder changes the input side of the objective. Let

\widetilde x \sim q_D(\widetilde x \mid x)

be a corrupted version of $x$ . The model receives $\widetilde x$ , but the loss still compares the output with the clean $x$ :

\min_{\theta,\phi} \mathbb E_{x\sim p_{\mathrm{data}}} \mathbb E_{\widetilde x \sim q_D(\widetilde x \mid x)} \ell\!\left(x, g_\phi(f_\theta(\widetilde x))\right).

The target has not changed. The model is not asked to reproduce the noise. It is asked to use the visible evidence and the learned regularities of the data distribution to reconstruct the clean example.

Three separations keep the concept honest.

First, reconstruction quality is not likelihood. A deterministic autoencoder trained with squared error does not define a prior over latent codes or a calibrated probability model over inputs.

Second, a VAE is not just an autoencoder with noise. A VAE trains a latent-variable generative model with an ELBO, a prior over latents, an approximate posterior $q_\phi(z \mid x)$ , and a likelihood term. Reconstruction is only one part of a probabilistic objective.

Third, denoising autoencoders are historically and conceptually related to denoising and score ideas, but they are not diffusion models. Modern diffusion needs a separate route through noise schedules, score or noise prediction, and iterative generative sampling.

Code

Keep the implementation aligned with the notation so the algorithm is legible.

Section prompt

Code witness 1import numpy as np rng = np.random.default_rng(0) X = np.array([[1, 1, 1, 0, 0, 0], [1, 1, 0,...python

import numpy as np

rng = np.random.default_rng(0)
X = np.array([[1, 1, 1, 0, 0, 0],
              [1, 1, 0, 0, 0, 0],
              [0, 0, 0, 1, 1, 1],
              [0, 0, 0, 0, 1, 1]], dtype=float)
n, d = X.shape
m = 2
W1 = 0.4 * rng.standard_normal((d, m)); b1 = np.zeros(m)
W2 = 0.4 * rng.standard_normal((m, d)); b2 = np.zeros(d)

def sigmoid(a):
    return 1 / (1 + np.exp(-a))

for _ in range(2500):
    keep = rng.random(X.shape) > 0.30
    flip_on = (rng.random(X.shape) < 0.03) & (X == 0)
    X_noisy = np.clip(X * keep + flip_on, 0, 1)
    H = np.tanh(X_noisy @ W1 + b1)
    Y = sigmoid(H @ W2 + b2)
    dY = 2 * (Y - X) * Y * (1 - Y) / n
    dH = (dY @ W2.T) * (1 - H ** 2)
    W2 -= 1.0 * (H.T @ dY); b2 -= 1.0 * dY.sum(0)
    W1 -= 1.0 * (X_noisy.T @ dH); b1 -= 1.0 * dH.sum(0)

clean = X[0]
corrupted = np.array([1, 0, 1, 0, 0, 0], dtype=float)
h = np.tanh(corrupted @ W1 + b1)
reconstruction = sigmoid(h @ W2 + b2)

print("clean:       ", clean.astype(int))
print("corrupted:   ", corrupted.astype(int))
print("latent code: ", np.round(h, 3))
print("reconstruct: ", np.round(reconstruction, 2))
print("mse corrupted -> clean:", round(np.mean((corrupted - clean) ** 2), 3))
print("mse recon -> clean:    ", round(np.mean((reconstruction - clean) ** 2), 3))

The key line is not the architecture; it is the target. The forward pass receives X_noisy, but the gradient is computed against X. The model is trained to reconstruct clean examples from corrupted evidence.

This snippet uses untied encoder and decoder weights. Tied weights are a legitimate autoencoder variant, but they make the shared-weight gradient easier to get wrong in a small teaching snippet. Here the code keeps the denoising objective visible.

Interactive Demo

Use direct manipulation to connect the explanation to a moving system.

Section prompt

The demo below is a tiny prediction-first denoising lab.

It is a toy stroke-code surrogate for the bottleneck and denoising idea, not a live neural-network trainer.

You see a corrupted input from a toy pattern family and choose a bottleneck width. Before reveal, the clean target, latent code, reconstruction error, and success label are hidden. Your prediction is whether the autoencoder will recover the most plausible clean pattern under the learned toy strokes.

After reveal, compare the clean target with the reconstruction. The lesson is not that the model discovers truth. The lesson is that bottleneck width and corruption level decide which shared structure the code can carry.

Live Concept Demo

Explore Autoencoders and Denoising Autoencoders

The stage is code-native and interactive. Use it to test the explanation against the mechanism.

difficulty 3/5undergraduatecode-aligned

Demo Prediction Checkpoint

Manipulate one control and predict the visible change.

Commit to what Autoencoders and Denoising Autoencoders should make visible before reading the result.

After The First Pass

Turn the concept into an inspected object.

Once the invariant is visible in the intuition, math, code, and demo, use these panels to inspect the mechanism visually, check source support, practice the idea, and attach a grounded research question.

Mechanism Storyboard

See the idea move before the page explains it

Prediction open01 / Intuition

Prediction lens

Start with the picture, metaphor, or geometric mechanism.

Commit first

Before reading further, choose the kind of change Autoencoders and Denoising Autoencoders should make visible.

Visual Inquiry

Make the image answer a mathematical question

4/4 stages readyLive demo connected

Visual cueWhich visible object should carry the first intuition?

Inspection depth2/4

Prediction

Which visible object should carry the first intuition?

Commit first

Pick the cue that should make Autoencoders and Denoising Autoencoders easier to reason about before the page gives the answer.

Source Grounding

Canonical references for the mechanism on this page.

book · 2016Deep Learning, Chapter 14: AutoencodersGoodfellow, Bengio, and Courville

Main source for the encoder/decoder reconstruction frame, undercomplete and regularized autoencoders, denoising autoencoders, training by backpropagation, and the identity-copying caveat.

Open source

paper · 2008Extracting and Composing Robust Features with Denoising AutoencodersVincent, Larochelle, Bengio, and Manzagol

Source for the deterministic autoencoder notation, average reconstruction loss, optional tied weights, stochastic input corruption, and clean-target denoising objective.

Open source

paper · 2006Reducing the Dimensionality of Data with Neural NetworksHinton and Salakhutdinov

Historical source for deep autoencoders as nonlinear dimensionality-reduction systems with a small central code, contrasted with PCA on specific datasets.

Open source

book · 2023Dive into Deep Learning: AutoRecZhang, Lipton, Li, and Smola

Practical source for an encoder/decoder reconstruction implementation pattern; used only as an applied example, not as denoising-autoencoder theory.

Open source

Claim Review

Autoencoders learn codes by reconstructing inputs; denoising autoencoders reconstruct clean targets from corrupted inputs, so the useful lesson is compression plus constraint, not magic understanding.

Status1 substantive review recorded

Claims without a substantive review badge still need exact source-support review.

Sources4 references

goodfellow-2016-autoencoders, vincent-2008-denoising-autoencoders, hinton-2006-dimensionality-autoencoders, d2l-autorec

Witnesses4 local objects

Use equation, code, and demo objects to check whether the source support is operational.

Substantively reviewedA deterministic autoencoder maps an input through an encoder to a code and through a decoder to a reconstruction, training by reconstruction loss; a denoising autoencoder feeds a corrupted sample into the encoder while keeping the clean example as the reconstruction target.Claim metadata: source checked

Goodfellow/Bengio/Courville support the core autoencoder frame, constraints, and denoising objective; Vincent et al. support corruption-to-clean reconstruction; Hinton/Salakhutdinov support the PCA contrast as historical dimensionality reduction; D2L supports a concrete encoder/decoder reconstruction implementation.

Sources: Deep Learning, Chapter 14: Autoencoders, Extracting and Composing Robust Features with Denoising Autoencoders, Reducing the Dimensionality of Data with Neural Networks, Dive into Deep Learning: AutoRecReconstruction quality is not likelihood, semantic understanding, causality, or downstream usefulness. Denoising autoencoders are historically related to denoising and score ideas, but this page does not teach diffusion models.A bounded review summary is present; still check caveats and exact source scope.

Checked Goodfellow chapter 14, Vincent et al. 2008, Hinton and Salakhutdinov 2006, and D2L AutoRec for reconstruction, bottleneck, denoising, PCA-bridge, and implementation claims. GPT Pro pre-draft review required strict graph edges, no direct diffusion prerequisite, untied-weight code, and explicit PCA/VAE/diffusion separations.

Reviewer: codex-source-scope+gpt-pro-brief; reviewed 2026-06-28

source-span-goodfellow-2016-autoencoders source-span-vincent-2008-denoising-autoencoders source-span-hinton-2006-dimensionality-autoencoders source-span-d2l-autorec math-object-1 math-object-2 code-witness-1 interactive-demo

Source support candidates

book 2016Deep Learning, Chapter 14: Autoencoders

Main source for the encoder/decoder reconstruction frame, undercomplete and regularized autoencoders, denoising autoencoders, training by backpropagation, and the identity-copying caveat.

paper 2008Extracting and Composing Robust Features with Denoising Autoencoders

Source for the deterministic autoencoder notation, average reconstruction loss, optional tied weights, stochastic input corruption, and clean-target denoising objective.

paper 2006Reducing the Dimensionality of Data with Neural Networks

Historical source for deep autoencoders as nonlinear dimensionality-reduction systems with a small central code, contrasted with PCA on specific datasets.

book 2023Dive into Deep Learning: AutoRec

Practical source for an encoder/decoder reconstruction implementation pattern; used only as an applied example, not as denoising-autoencoder theory.

Mechanism witnesses

Equation 1

x \in \mathbb R^d

Equation 2

f_\theta:\mathbb R^d \to \mathbb R^m

Code witness 1import numpy as np rng = np.random.default_rng(0) X = np.array([[1, 1, 1, 0, 0, 0], [1, 1, 0,...Demo stateLive mechanism probe

Practice Loop

Try the idea before it explains itself

Readiness0/3 checks ready

Predict

Before touching the demo, predict one visible change that should happen in Autoencoders and Denoising Autoencoders.

Hint 1

Reveal when your model needs a nudge.

Hint 2

Reveal when your model needs a nudge.

Hint 3

Reveal when your model needs a nudge.

Your answer canvas

Local checks

Claim

A concrete answer is on the canvas.

Mechanism

The answer names why the claim should hold.

Bridge

It touches the page context or a neighboring idea.

Misconception check

Object research drawerClose

ConceptAutoencoders and Denoising AutoencodersRepresentation Learning

Code witness comparisonAutoencoders and Denoising Autoencoders code witness 1rng = np.random.default_rng(0)Prediction before revealAutoencoders and Denoising Autoencoders interactive demoManipulate one control and predict the visible change.

Grounded room questionWhat is the smallest example that makes Autoencoders and Denoising Autoencoders click without losing the math?Local snapshot ready

Research Room

Attach the question to an exact object

Pick the concept, equation, source, code witness, claim, misconception, or demo state before asking for help. The handoff stays grounded to that object.

Next local actionNo local draft saved yet

Open the draft below to save one note and next action in this browser.

conceptRepresentation Learning

Autoencoders and Denoising Autoencoders

Anchored question

What is the smallest example that makes Autoencoders and Denoising Autoencoders click without losing the math?

Local action draftNo local draft saved yetExpand only when ready to capture one local next action

Local action draft

This draft stays locally in this browser for concept:representation-learning/autoencoders-denoising-autoencoders.

Draft noteNext action

No local draft saved.

Evidence to inspect

Source ids to inspect: goodfellow-2016-autoencoders, vincent-2008-denoising-autoencoders, hinton-2006-dimensionality-autoencoders, d2l-autorec
Definition, prerequisite, and contrast concept links
The equation or code witness that makes the concept operational
One demo state that shows the invariant instead of a slogan

What would resolve this

The learner can state the mechanism in their own words
The learner can name the prerequisite that would repair confusion
The learner can predict how the mechanism changes under one perturbation

Grounded AI handoff

I am working in Continuous Function's research reading room. Object: concept - Autoencoders and Denoising Autoencoders Object key: concept:representation-learning/autoencoders-denoising-autoencoders Context: Representation Learning Anchor id: concept/concept-notebook/representation-learning/autoencoders-denoising-autoencoders Open question: What is the smallest example that makes Autoencoders and Denoising Autoencoders click without losing the math? Evidence to inspect: - Source ids to inspect: goodfellow-2016-autoencoders, vincent-2008-denoising-autoencoders, hinton-2006-dimensionality-autoencoders, d2l-autorec - Definition, prerequisite, and contrast concept links - The equation or code witness that makes the concept operational - One demo state that shows the invariant instead of a slogan What would resolve this: - The learner can state the mechanism in their own words - The learner can name the prerequisite that would repair confusion - The learner can predict how the mechanism changes under one perturbation Answer as a careful research tutor: stay source-grounded, separate verified evidence from assumptions, name the relevant math objects, and end with one next action.

Open source object

concept/concept-notebook/representation-learning/autoencoders-denoising-autoencoders
concept:representation-learning/autoencoders-denoising-autoencoders

Learning Map

Before / Now / Try / Next

BeforePCA as Optimization and Eigenspace Projection

NowIntuition → Math → Code → Demo

TryManipulate one control and predict the visible change.

NextVariational Autoencoders

Intuitionready
Mathready
Codeready
Interactive Demoready

Object Companion

Ask beside the selected object

Your question

GoalComfortStyleStuck on

Context prompt

You are my AI learning companion for Continuous Function. Current context: Representation Learning concept. Learning surface: Autoencoders and Denoising Autoencoders. What this page says: Autoencoders learn codes by reconstructing inputs; denoising autoencoders reconstruct clean targets from corrupted inputs, so the useful lesson is compression plus constraint, not magic understanding. Current section: Intuition, math, code, and interactive demo. Suggested next step: Manipulate one control and predict the visible change.. Learner goal: Understand the idea. Learner comfort level: New to this. Preferred explanation style: Visual first. Task: Explain the central idea in plain language, then restate it with the exact math objects from the page. Answer in a way that helps me learn: ask one clarifying question only if needed, use intuition before notation, and end with one thing I should try on the page.

Domain

Representation Learning

representation-learningautoencodersdenoisingdimensionality-reductionunsupervised-learning

Prerequisites

PCA as Optimization and Eigenspace Projection Gradient Descent Backpropagation

Leads To

Variational Autoencoders Sparse Autoencoders: Feature Dictionaries for Mechanistic Interpretability

Representation Learning & Embedding Geometry

Within this domain