Bring the mental model from Linear Regression & Least Squares; this page will reuse it instead of restarting from zero.
Machine Learning
Logistic Regression
Logistic regression turns a linear score into a probability, then learns a classification boundary by Bernoulli likelihood.
Concept Structure
Logistic Regression
Start with the picture, metaphor, or geometric mechanism.
Make the objects explicit and connect them with notation.
Mirror the equations with runnable implementation details.
Manipulate the mechanism and watch the idea respond.
Learning map
Logistic RegressionConceptual Bridge
What should feel connected as you move through this page.
Logistic regression turns a linear score into a probability, then learns a classification boundary by Bernoulli likelihood.
The next edge should feel earned: use the demo prediction here before following Classification Metrics, Thresholds, and Calibration.
01
Intuition
Build the mental picture first so the rest of the page has something to attach to.
You are here because many AI systems make categorical decisions from scores: a classifier scores labels, a language model scores tokens, and DPO scores preference pairs with a logistic loss. Logistic regression is the smallest honest version of that pattern.
Before this, know linear regression, maximum likelihood, and cross-entropy. By the end, you should be able to explain what a logit is, why the sigmoid creates a probability, and why the gradient for one example is .
Linear regression predicts a number on the whole real line. Classification needs a probability between and . Logistic regression keeps the linear score, then bends it through the sigmoid:
The score is the logit. It can be any real number. The probability is constrained to the interval .
The decision boundary is where the model is exactly unsure:
For a one-dimensional model , this happens when . Changing changes the steepness and orientation. Changing shifts the boundary.
The most common misconception is that logistic regression is "linear regression with a squashed output." The output is squashed, but the training objective is different: it is Bernoulli likelihood, not squared error.
Another subtle caveat: the number is a model probability, not a promise that the fitted classifier is calibrated on future data. And with perfectly separable training data, unregularized logistic regression may keep increasing the weight norm instead of reaching a finite maximum-likelihood solution.
02
Math
Translate the story into symbols, assumptions, and a derivation you can inspect.
For one example with features and binary label , define the linear score
where and . The predicted probability of label is
The Bernoulli likelihood for this example is
The negative log-likelihood is
This is binary cross-entropy, also called Bernoulli negative log-likelihood. Differentiating this loss through the sigmoid gives the compact logit gradient:
That single expression is the training signal. If and is too small, then , so gradient descent increases the logit. If and is too large, then , so gradient descent lowers the logit.
For a dataset with rows , labels , and probabilities , the average loss is
The gradients are
and
Here , , and . The shape mirrors linear regression, but the residual-like term is probability error , not numeric target error .
03
Code
Keep the implementation aligned with the notation so the algorithm is legible.
import numpy as np
# Binary logistic regression in one feature.
# Shapes: X is (n, 2), y is (n,), w is (2,).
x = np.array([-3, -2, -1, 0, 1, 2, 3], dtype=float)
y = np.array([0, 0, 0, 0, 1, 1, 1], dtype=float)
X = np.column_stack([np.ones_like(x), x])
w = np.array([-0.2, 0.7])
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def loss_and_grad(w):
z = X @ w
q = sigmoid(z)
eps = 1e-12
loss = -np.mean(y * np.log(q + eps) + (1 - y) * np.log(1 - q + eps))
grad = X.T @ (q - y) / y.size
return loss, grad, q
for step in range(60):
loss, grad, q = loss_and_grad(w)
w = w - 0.5 * grad
loss, grad, q = loss_and_grad(w)
print("bias, slope:", np.round(w, 3))
print("probabilities:", np.round(q, 3))
print("loss:", round(loss, 4))
print("gradient:", np.round(grad, 6))
print("decision boundary x:", round(-w[0] / w[1], 3))
The code mirrors the math: z = X @ w creates logits, q = sigmoid(z) creates probabilities, and X.T @ (q - y) / n is the gradient.
04
Interactive Demo
Use direct manipulation to connect the explanation to a moving system.
Choose a point and inspect its current probability before revealing the logit-gradient direction. Predict whether training wants to raise that point's logit, lower it, or leave it mostly unchanged.
The curve shows over one feature. Points above the midline are positive examples; points below are negative examples. The reveal marks the selected point's contribution and reports the average loss. Move the slope and bias to see how the same data can be underconfident, overconfident, or misclassified.
Live Concept Demo
Explore Logistic Regression
The stage is code-native and interactive. Use it to test the explanation against the mechanism.
Manipulate one control and predict the visible change.
Commit to what Logistic Regression should make visible before reading the result.
After The First Pass
Turn the concept into an inspected object.
Once the invariant is visible in the intuition, math, code, and demo, use these panels to inspect the mechanism visually, check source support, practice the idea, and attach a grounded research question.
Mechanism Storyboard
See the idea move before the page explains it
Logistic regression turns a linear score into a probability, then learns a classification boundary by Bernoulli likelihood.
Start with the picture, metaphor, or geometric mechanism.
Before reading further, choose the kind of change Logistic Regression should make visible.
Visual Inquiry
Make the image answer a mathematical question
Logistic regression turns a linear score into a probability, then learns a classification boundary by Bernoulli likelihood.
Which visible object should carry the first intuition?
Pick the cue that should make Logistic Regression easier to reason about before the page gives the answer.
Source Grounding
Canonical references for the mechanism on this page.
Curriculum source for logistic regression as a Bernoulli GLM trained by likelihood.
Open sourceCurriculum source for logits, sigmoid/softmax classifiers, and cross-entropy training.
Open sourceClaim Review
Logistic regression turns a linear score into a probability, then learns a classification boundary by Bernoulli likelihood.
Claims without a substantive review badge still need exact source-support review.
cs229-logistic-regression, goodfellow-2016-deep-learning
Use equation, code, and demo objects to check whether the source support is operational.
CS229 supports the sigmoid hypothesis, Bernoulli likelihood, and log-likelihood gradient; Goodfellow supports sigmoid as a Bernoulli-parameter map and logistic regression as likelihood-trained classification. Together they support the page's NLL/BCE and q-y gradient convention.
Sources: CS229 Lecture Notes: Logistic Regression, Deep LearningThis source check covers binary logistic regression, not multinomial softmax regression, empirical calibration guarantees, cost-sensitive thresholding, or regularized solutions under perfect separation.A bounded review summary is present; still check caveats and exact source scope.Checked CS229 section 2.1 and Goodfellow chapters 3/5. CS229 gives sigmoid, Bernoulli likelihood, log-likelihood, and gradient ascent (y-h)x; negating gives the page's BCE/NLL q-y descent signal. Goodfellow supports sigmoid as a Bernoulli parameter and logistic regression as likelihood-trained classification.
Reviewer: codex+gpt-pro-prior; reviewed 2026-06-28Practice Loop
Try the idea before it explains itself
Logistic regression turns a linear score into a probability, then learns a classification boundary by Bernoulli likelihood.
Before touching the demo, predict one visible change that should happen in Logistic Regression.
Reveal when your model needs a nudge.
Reveal when your model needs a nudge.
Reveal when your model needs a nudge.
A concrete answer is on the canvas.
The answer names why the claim should hold.
It touches the page context or a neighboring idea.
Research Room
Attach the question to an exact object
Pick the concept, equation, source, code witness, claim, misconception, or demo state before asking for help. The handoff stays grounded to that object.Open the draft below to save one note and next action in this browser.
Logistic Regression
What is the smallest example that makes Logistic Regression click without losing the math?
Local action draftNo local draft saved yetExpand only when ready to capture one local next action
This draft stays locally in this browser for concept:machine-learning/logistic-regression.
- Source ids to inspect: cs229-logistic-regression, goodfellow-2016-deep-learning
- Definition, prerequisite, and contrast concept links
- The equation or code witness that makes the concept operational
- One demo state that shows the invariant instead of a slogan
- The learner can state the mechanism in their own words
- The learner can name the prerequisite that would repair confusion
- The learner can predict how the mechanism changes under one perturbation
I am working in Continuous Function's research reading room. Object: concept - Logistic Regression Object key: concept:machine-learning/logistic-regression Context: Machine Learning Anchor id: concept/concept-notebook/machine-learning/logistic-regression Open question: What is the smallest example that makes Logistic Regression click without losing the math? Evidence to inspect: - Source ids to inspect: cs229-logistic-regression, goodfellow-2016-deep-learning - Definition, prerequisite, and contrast concept links - The equation or code witness that makes the concept operational - One demo state that shows the invariant instead of a slogan What would resolve this: - The learner can state the mechanism in their own words - The learner can name the prerequisite that would repair confusion - The learner can predict how the mechanism changes under one perturbation Answer as a careful research tutor: stay source-grounded, separate verified evidence from assumptions, name the relevant math objects, and end with one next action.
concept/concept-notebook/machine-learning/logistic-regression
concept:machine-learning/logistic-regression