Calculus

Reverse-Mode Automatic Differentiation

Reverse-mode autodiff computes gradients by sending cotangents backward through a computation graph.

status: publishedimportance: criticaldifficulty 3/5math: undergraduateread: 14mlive demo

Back to Calculus Next: Backpropagation

Editorial autodiff illustration of a forward tape and reverse cotangent sweep through computation nodes.

Concept Structure

Reverse-Mode Automatic Differentiation

01Intuition

Start with the picture, metaphor, or geometric mechanism.

02Math

Make the objects explicit and connect them with notation.

03Code

Mirror the equations with runnable implementation details.

04Interactive Demo

Manipulate the mechanism and watch the idea respond.

1prerequisites

1next concepts

2related links

Learning map

Reverse-Mode Automatic Differentiation

BeforeComputation GraphsNow4/4 sections readyTryManipulate one control and predict the visible change.NextBackpropagation

Object flow

4/4 sections readyAsk about this Research room

ConceptReverse-Mode Automatic DifferentiationCalculus EquationReverse-Mode Automatic Differentiation equation 1Exact equation object CodeReverse-Mode Automatic Differentiation code witness 1Exact code witness DemoReverse-Mode Automatic Differentiation interactive demoVisualization object ClaimReverse-mode AD records a forward evaluation trace of primitive opera...Exact claim check SourceAutomatic differentiation in machine learning: a surveyExact source object

ConceptReverse-Mode Automatic DifferentiationCalculus

1 source attachedLocal snapshot ready

concept:calculus/reverse-mode-autodiff

Codewitness nearby Predictbefore reveal Roomobject handoff

Conceptual Bridge

What should feel connected as you move through this page.

Carry inComputation Graphs

Bring the mental model from Computation Graphs; this page will reuse it instead of restarting from zero.

Work hereReverse-Mode Automatic Differentiation

Reverse-mode autodiff computes gradients by sending cotangents backward through a computation graph.

Carry outBackpropagation

The next edge should feel earned: use the demo prediction here before following Backpropagation.

Test the linkManipulate one control and predict the visible change.Then continue to Backpropagation

01IntuitionStart with the picture, metaphor, or geometric mechanism.02MathMake the objects explicit and connect them with notation.03CodeMirror the equations with runnable implementation details.04Interactive DemoManipulate the mechanism and watch the idea respond.

Intuition

Build the mental picture first so the rest of the page has something to attach to.

Section prompt

Suppose one scalar loss depends on millions of parameters. Do you need to run one derivative computation for each parameter to know how to change them all?

Reverse-mode automatic differentiation is the bookkeeping trick that makes the answer no.

The previous idea, computation graphs, makes dependencies visible. Reverse-mode AD adds an execution rule: during the forward pass, record the primitive operations and the intermediate values they will need later. During the reverse pass, start from the final question, "how much does the loss change if this output changes?", and walk backward through the recorded operations. Each local backward rule converts an output sensitivity into input sensitivities.

The key advantage is shape. If one scalar loss depends on many parameters, reverse mode can compute all parameter gradients in one backward sweep through the graph. Forward mode would ask, one input direction at a time, how the output changes.

The useful mental model is a tape plus a register file. The tape remembers what primitive operations ran. The registers store cotangents such as $\bar a$ and $\bar x$ . The model breaks if you imagine symbolic simplification: reverse mode does not expand the formula by hand; it accumulates local contributions on the graph that actually ran.

Math

Translate the story into symbols, assumptions, and a derivation you can inspect.

Section prompt

Equation 1\bar v_i = \frac{\partial L}{\partial v_i}.Equation 2\bar L = \frac{\partial L}{\partial L} = 1.

Let a differentiable computation graph produce a scalar output $L$ from intermediate variables $v_1,\dots,v_n$ . Reverse mode stores, for each node, an adjoint or cotangent

\bar v_i = \frac{\partial L}{\partial v_i}.

Before the reverse sweep, initialize every non-output cotangent register to zero. Then seed the scalar output with

\bar L = \frac{\partial L}{\partial L} = 1.

For scalar nodes and a local operation $v_j = f(v_i)$ , the chain rule sends sensitivity backward. When the operation producing $v_j$ is processed, $\bar v_j$ already contains all downstream contributions:

\bar v_i \mathrel{+}= \bar v_j \frac{\partial v_j}{\partial v_i}.

For an operation with multiple inputs, such as $v_j=f(u,w)$ , each input receives its own local derivative:

\bar u \mathrel{+}= \bar v_j \frac{\partial v_j}{\partial u},\qquad \bar w \mathrel{+}= \bar v_j \frac{\partial v_j}{\partial w}.

The plus-equals matters. If a value is reused by several later nodes, all downstream paths contribute to its total sensitivity. Reverse mode is therefore not symbolic simplification; it is graph-local accumulation of vector-Jacobian products.

For vector nodes, choose column-vector cotangents. If $v_i\in\mathbb{R}^n$ , $v_j=f(v_i)\in\mathbb{R}^m$ , $\bar v_i\in\mathbb R^n$ , $\bar v_j\in\mathbb R^m$ , and

J_{ji}=\frac{\partial v_j}{\partial v_i}\in\mathbb{R}^{m\times n},

then the reverse update is

\bar v_i \mathrel{+}= J_{ji}^{\mathsf T}\bar v_j.

This is the direction contrast:

\text{JVP: }\dot v_j=J_{ji}\dot v_i,\qquad \text{VJP: }\bar v_i=J_{ji}^{\mathsf T}\bar v_j.

For a scalar loss $L:\mathbb R^p\to\mathbb R$ , one forward evaluation records the needed values, and one reverse sweep gives the full gradient

\nabla_\theta L = \left(\frac{\partial L}{\partial \theta_1},\dots,\frac{\partial L}{\partial \theta_p}\right)^{\mathsf T}

assuming primitive backward rules are available. The cost is memory for saved forward values on a tape, or extra compute if some values are recomputed. An autodiff engine automates this bookkeeping by recording primitive operations during the forward pass and executing their local backward rules in reverse topological order.

These equations assume the recorded primitives are differentiable at the saved forward values. For nonsmooth primitives, an implementation must choose a convention, use a subgradient, or report that the derivative is undefined. If the program has control flow, reverse mode differentiates the branch that actually ran.

Code

Keep the implementation aligned with the notation so the algorithm is legible.

Section prompt

Code witness 1import math x, y = 2.0, 3.0 # Forward graph: # a = x * y # b = sin(a) # L = a + b a = x * y b...python

import math

x, y = 2.0, 3.0

# Forward graph:
# a = x * y
# b = sin(a)
# L = a + b
a = x * y
b = math.sin(a)
L = a + b

# Reverse-mode table: each bar_* stores dL/d(node).
# Start with empty cotangent registers, then seed the output.
bar_L = 0.0
bar_a = 0.0
bar_b = 0.0
bar_x = 0.0
bar_y = 0.0

bar_L = 1.0

# Read the tape backward.
# L = a + b sends one unit of sensitivity to both inputs.
bar_a += bar_L * 1.0
bar_b += bar_L * 1.0

# b = sin(a) contributes another path back into a.
bar_a += bar_b * math.cos(a)

# a = x * y sends sensitivity to both inputs.
bar_x += bar_a * y
bar_y += bar_a * x

print("L:", round(L, 4))
print("dL/dx:", round(bar_x, 4))
print("dL/dy:", round(bar_y, 4))

The code initializes every cotangent register and then uses += for local contributions. The reused node $a$ receives two contributions: directly through $L=a+b$ , and indirectly through $b=\sin(a)$ . With $x=2$ and $y=3$ , the output is approximately $L=5.7206$ , $\partial L/\partial x=5.8805$ , and $\partial L/\partial y=3.9203$ .

Interactive Demo

Use direct manipulation to connect the explanation to a moving system.

Section prompt

Use the sliders to change $x$ and $y$ , then compare the three phases.

Forward tape mode records the primitive operations and the saved values that local backward rules will need. Reverse sweep mode reads the same tape backward, starts from $\bar L=1$ , and fills the cotangent registers. Cost shape mode highlights the main reason reverse mode matters for deep learning: when many inputs feed one scalar loss, one reverse sweep gives the whole gradient vector.

Try the second preset after making a prediction. It changes the product regime so the same tape can expose a different cotangent-accumulation pattern.

Live Concept Demo

Explore Reverse-Mode Automatic Differentiation

The stage is code-native and interactive. Use it to test the explanation against the mechanism.

difficulty 3/5undergraduatecode-aligned

Demo Prediction Checkpoint

Manipulate one control and predict the visible change.

Commit to what Reverse-Mode Automatic Differentiation should make visible before reading the result.

After The First Pass

Turn the concept into an inspected object.

Once the invariant is visible in the intuition, math, code, and demo, use these panels to inspect the mechanism visually, check source support, practice the idea, and attach a grounded research question.

Mechanism Storyboard

See the idea move before the page explains it

Reverse-mode autodiff computes gradients by sending cotangents backward through a computation graph.

Prediction open01 / Intuition

Prediction lens

Start with the picture, metaphor, or geometric mechanism.

Commit first

Before reading further, choose the kind of change Reverse-Mode Automatic Differentiation should make visible.

Visual Inquiry

Make the image answer a mathematical question

Reverse-mode autodiff computes gradients by sending cotangents backward through a computation graph.

4/4 stages readyLive demo connected

Visual cueWhich visible object should carry the first intuition?

Inspection depth2/4

Prediction

Which visible object should carry the first intuition?

Commit first

Pick the cue that should make Reverse-Mode Automatic Differentiation easier to reason about before the page gives the answer.

Source Grounding

Canonical references for the mechanism on this page.

paper · 2018Automatic differentiation in machine learning: a surveyBaydin et al.

Grounds reverse mode as the efficient way to compute gradients of scalar losses with many parameters.

Open source

Claim Review

Reverse-mode autodiff computes gradients by sending cotangents backward through a computation graph.

Status1 substantive review recorded

Claims without a substantive review badge still need exact source-support review.

Sources1 reference

baydin-2018-ad-survey

Witnesses4 local objects

Use equation, code, and demo objects to check whether the source support is operational.

Substantively reviewedReverse-mode AD records a forward evaluation trace of primitive operations and saved values, seeds a scalar output cotangent with 1, then sweeps backward applying local VJPs and += accumulation so one reverse pass computes the full gradient of one scalar loss with respect to many inputs.Claim metadata: source checked

Baydin et al. describe reverse mode as running code forward to populate intermediate variables and record graph dependencies, then propagating adjoints backward. Their example shows incremental adjoint accumulation and output adjoint 1; they state that for f:R^n->R one reverse-mode application computes the full gradient, matching scalar-valued ML objectives with many parameters.

Sources: Automatic differentiation in machine learning: a surveyChecks reverse-mode bookkeeping for one executed differentiable computation: forward tape/saved values, then reverse cotangent sweep with primitive backward rules. Not checkpointing, recomputation, nonsmooth primitives, control flow, framework edge cases, or higher derivatives.A bounded review summary is present; still check caveats and exact source scope.

Checked Baydin et al. 3.2: reverse mode runs code forward to populate intermediates and record dependencies, then propagates adjoints backward. The example starts from output adjoint 1, accumulates reused-variable cotangents from downstream paths, and gets both input derivatives in one reverse pass. Baydin says for f:R^n->R one reverse application computes the full gradient. Local witnesses match tape values, bar L=1, += pullbacks, VJP/J^T notation, and scalar-loss shape.

Reviewer: codex+oracle; reviewed 2026-05-07

source-span-baydin-2018-ad-survey math-object-1 math-object-2 code-witness-1 interactive-demo

Source support candidates

paper 2018Automatic differentiation in machine learning: a survey

Grounds reverse mode as the efficient way to compute gradients of scalar losses with many parameters.

Mechanism witnesses

Equation 1

\bar v_i = \frac{\partial L}{\partial v_i}.

Equation 2

\bar L = \frac{\partial L}{\partial L} = 1.

Code witness 1import math x, y = 2.0, 3.0 # Forward graph: # a = x * y # b = sin(a) # L = a + b a = x * y b...Demo stateLive mechanism probe

Practice Loop

Try the idea before it explains itself

Reverse-mode autodiff computes gradients by sending cotangents backward through a computation graph.

Readiness0/3 checks ready

Predict

Before touching the demo, predict one visible change that should happen in Reverse-Mode Automatic Differentiation.

Hint 1

Reveal when your model needs a nudge.

Hint 2

Reveal when your model needs a nudge.

Hint 3

Reveal when your model needs a nudge.

Your answer canvas

Local checks

Claim

A concrete answer is on the canvas.

Mechanism

The answer names why the claim should hold.

Bridge

It touches the page context or a neighboring idea.

Misconception check

Object research drawerClose

ConceptReverse-Mode Automatic DifferentiationCalculus

Code witness comparisonReverse-Mode Automatic Differentiation code witness 1x, y = 2.0, 3.0 Prediction before revealReverse-Mode Automatic Differentiation interactive demoManipulate one control and predict the visible change.

Grounded room questionWhat is the smallest example that makes Reverse-Mode Automatic Differentiation click without losing the math?Local snapshot ready

Research Room

Attach the question to an exact object

Pick the concept, equation, source, code witness, claim, misconception, or demo state before asking for help. The handoff stays grounded to that object.

Next local actionNo local draft saved yet

Open the draft below to save one note and next action in this browser.

conceptCalculus

Reverse-Mode Automatic Differentiation

Anchored question

What is the smallest example that makes Reverse-Mode Automatic Differentiation click without losing the math?

Local action draftNo local draft saved yetExpand only when ready to capture one local next action

Local action draft

This draft stays locally in this browser for concept:calculus/reverse-mode-autodiff.

Draft noteNext action

No local draft saved.

Evidence to inspect

Source ids to inspect: baydin-2018-ad-survey
Definition, prerequisite, and contrast concept links
The equation or code witness that makes the concept operational
One demo state that shows the invariant instead of a slogan

What would resolve this

The learner can state the mechanism in their own words
The learner can name the prerequisite that would repair confusion
The learner can predict how the mechanism changes under one perturbation

Grounded AI handoff

I am working in Continuous Function's research reading room. Object: concept - Reverse-Mode Automatic Differentiation Object key: concept:calculus/reverse-mode-autodiff Context: Calculus Anchor id: concept/concept-notebook/calculus/reverse-mode-autodiff Open question: What is the smallest example that makes Reverse-Mode Automatic Differentiation click without losing the math? Evidence to inspect: - Source ids to inspect: baydin-2018-ad-survey - Definition, prerequisite, and contrast concept links - The equation or code witness that makes the concept operational - One demo state that shows the invariant instead of a slogan What would resolve this: - The learner can state the mechanism in their own words - The learner can name the prerequisite that would repair confusion - The learner can predict how the mechanism changes under one perturbation Answer as a careful research tutor: stay source-grounded, separate verified evidence from assumptions, name the relevant math objects, and end with one next action.

Open source object

concept/concept-notebook/calculus/reverse-mode-autodiff
concept:calculus/reverse-mode-autodiff

Learning Map

Before / Now / Try / Next

BeforeComputation Graphs

NowIntuition → Math → Code → Demo

TryManipulate one control and predict the visible change.

NextBackpropagation

Intuitionready
Mathready
Codeready
Interactive Demoready

Object Companion

Ask beside the selected object

Reverse-mode autodiff computes gradients by sending cotangents backward through a computation graph.

Your question

GoalComfortStyleStuck on

Context prompt

You are my AI learning companion for Continuous Function. Current context: Calculus concept. Learning surface: Reverse-Mode Automatic Differentiation. What this page says: Reverse-mode autodiff computes gradients by sending cotangents backward through a computation graph. Current section: Intuition, math, code, and interactive demo. Suggested next step: Manipulate one control and predict the visible change.. Learner goal: Understand the idea. Learner comfort level: New to this. Preferred explanation style: Visual first. Task: Explain the central idea in plain language, then restate it with the exact math objects from the page. Answer in a way that helps me learn: ask one clarifying question only if needed, use intuition before notation, and end with one thing I should try on the page.

Domain

Calculus

calculusautodiffgradients

Prerequisites

Computation Graphs

Leads To

Backpropagation

Chain Rule Gradient Descent

Within this domain