Bring the mental model from Train/Dev/Test Splits, Cross-Validation, and Leakage; this page will reuse it instead of restarting from zero.
Machine Learning
Model Selection and Hyperparameter Search
Model selection turns many candidate settings into one chosen procedure; dev/CV may choose, while test stays untouched for final evidence.
Concept Structure
Model Selection and Hyperparameter Search
Start with the picture, metaphor, or geometric mechanism.
Make the objects explicit and connect them with notation.
Mirror the equations with runnable implementation details.
Manipulate the mechanism and watch the idea respond.
Learning map
Model Selection and Hyperparameter SearchConceptual Bridge
What should feel connected as you move through this page.
Model selection turns many candidate settings into one chosen procedure; dev/CV may choose, while test stays untouched for final evidence.
The next edge should feel earned: use the demo prediction here before following Evaluation Pipelines.
01
Intuition
Build the mental picture first so the rest of the page has something to attach to.
You are here because training one model is rarely the real experiment. In practice, you try many settings: polynomial degree, regularization strength, feature count, threshold, architecture size, prompt template, learning rate, batch size, or decoding rule. Model selection is the discipline that keeps that search from quietly becoming test-set memorization.
Before this, know train/dev/test splits, regularization, and why metrics can hide different costs. By the end, you should be able to run a small grid search, explain what the development set is allowed to choose, and say why the test set must stay untouched until the selected procedure is frozen.
Think of a hyperparameter grid as a menu of possible procedures:
- degree 2, ridge
- degree 5, ridge
- degree 8, ridge
- many more
Each candidate gets fit on training data. Then a development set or cross-validation estimate chooses which candidate to keep. That chosen candidate is not "the model that happened to have the lowest visible number." It is the result of a search procedure.
The dangerous move is test peeking:
- fit many candidates on train
- look at the test scores
- choose the candidate with the best test score
- report that same best test score
That score is no longer a final estimate. It is the score you optimized over. With enough candidates, one can look unusually good on the test set by luck. The model may not be better; you may have selected a lucky error bar.
The repair is simple and strict:
- train data fits parameters
- dev or cross-validation chooses hyperparameters
- test data estimates the already chosen procedure once
If you need to tune and estimate performance on scarce data, use an outer evaluation loop such as nested cross-validation or keep a separate audit set. The invariant is not "never search." The invariant is "never report the data you searched over as if it were untouched evidence."
02
Math
Translate the story into symbols, assumptions, and a derivation you can inspect.
Let be a finite set of hyperparameter settings. A setting might encode degree, penalty strength, feature count, threshold, or an architecture choice.
For each candidate, the training algorithm fits parameters using only training data:
The development risk estimate is
Model selection chooses the setting with the lowest development estimate:
Only after is fixed do we evaluate once on test:
This estimates the selected procedure, not every candidate separately. The test set did not choose ; it only measured the procedure after selection.
Cross-validation replaces one development split with several rotating validation folds. For folds inside the training budget,
Then choose
The crucial detail is fold locality. Preprocessing, feature selection, early stopping, calibration, and threshold choice must be learned inside each training fold when they are part of the candidate procedure.
Test peeking changes the selection rule to
The reported value
is optimistically biased as an estimate of future performance because the minimum selected a favorable noise realization. More candidates usually create more chances to find a lucky low estimate. That does not mean large grids are forbidden. It means the grid must be selected using development/CV evidence and evaluated with untouched data.
Nested cross-validation separates the two jobs when no external test set is available. The inner loop chooses ; the outer loop estimates the performance of the whole selection procedure.
03
Code
Keep the implementation aligned with the notation so the algorithm is legible.
import numpy as np
rng = np.random.default_rng(4)
degrees = range(1, 9)
lambdas = [0.1, 1.0, 10.0, 100.0]
def true_risk(degree, lam):
log_lambda = np.log10(lam)
return 0.12 + 0.015 * (degree - 4) ** 2 + 0.012 * (log_lambda - 0.3) ** 2
rows = []
for degree in degrees:
for lam in lambdas:
risk = true_risk(degree, lam)
dev = risk + rng.normal(0, 0.035)
test = risk + rng.normal(0, 0.035)
audit = risk + rng.normal(0, 0.008)
rows.append((dev, test, audit, degree, lam))
by_dev = min(rows, key=lambda row: row[0])
by_test = min(rows, key=lambda row: row[1])
for name, row in [("dev-selected", by_dev), ("test-peeked", by_test)]:
print(name, {
"degree": row[3],
"lambda": row[4],
"dev": round(float(row[0]), 3),
"test": round(float(row[1]), 3),
"audit": round(float(row[2]), 3),
})
print("test-peek optimism:", round(by_test[2] - by_test[1], 3))
The code simulates noisy estimates for a degree/lambda grid. The development-selected candidate is the clean protocol. The test-peeked candidate is the invalid protocol: it chooses the lowest test number and then tries to report that same number as final evidence. The audit column stands in for fresh data that did not participate in the search.
04
Interactive Demo
Use direct manipulation to connect the explanation to a moving system.
Choose the search width and estimate noise. Before revealing the grid, predict which protocol will survive the fresh audit: development selection, test peeking, or a small-grid tie.
The grid shows candidate degree/lambda settings while the scores stay hidden. After reveal, the heatmap appears: one marker shows the candidate chosen by development score and another shows the candidate chosen by repeatedly reading test scores. The point is not that test peeking always chooses a more complex model. The point is sharper: once the test set chooses, its score is no longer untouched evidence.
Live Concept Demo
Explore Model Selection and Hyperparameter Search
The stage is code-native and interactive. Use it to test the explanation against the mechanism.
Manipulate one control and predict the visible change.
Commit to what Model Selection and Hyperparameter Search should make visible before reading the result.
After The First Pass
Turn the concept into an inspected object.
Once the invariant is visible in the intuition, math, code, and demo, use these panels to inspect the mechanism visually, check source support, practice the idea, and attach a grounded research question.
Mechanism Storyboard
See the idea move before the page explains it
Model selection turns many candidate settings into one chosen procedure; dev/CV may choose, while test stays untouched for final evidence.
Start with the picture, metaphor, or geometric mechanism.
Before reading further, choose the kind of change Model Selection and Hyperparameter Search should make visible.
Visual Inquiry
Make the image answer a mathematical question
Model selection turns many candidate settings into one chosen procedure; dev/CV may choose, while test stays untouched for final evidence.
Which visible object should carry the first intuition?
Pick the cue that should make Model Selection and Hyperparameter Search easier to reason about before the page gives the answer.
Source Grounding
Canonical references for the mechanism on this page.
Source for hold-out cross-validation, k-fold cross-validation, and choosing among model classes or hyperparameters using held-out empirical error.
Open sourceSource for validation-set and cross-validation approaches to estimating test error for model assessment and model selection.
Open sourceSource for grid search, randomized search, cross-validated estimators, and the idea that hyperparameters are set before fitting model parameters.
Open sourceSource for the warning that using the same data to tune and evaluate can produce optimistically biased scores.
Open sourceSource for model-selection overfitting and biased performance evaluation when selection variance is ignored.
Open sourceClaim Review
Model selection turns many candidate settings into one chosen procedure; dev/CV may choose, while test stays untouched for final evidence.
Claims without a substantive review badge still need exact source-support review.
cs229-regularization-model-selection, islr-resampling-model-selection, sklearn-grid-search, sklearn-nested-cross-validation, cawley-talbot-selection-bias
Use equation, code, and demo objects to check whether the source support is operational.
CS229 and ISLR support held-out/CV model selection; scikit-learn supports grid/randomized search and warns through nested-CV examples that tuning and evaluating on the same data biases estimates; Cawley and Talbot support selection overfitting and subsequent selection bias.
Sources: CS229 Notes: Regularization and Model Selection, An Introduction to Statistical Learning, scikit-learn User Guide: Tuning the hyper-parameters of an estimator, scikit-learn Example: Nested versus non-nested cross-validation, On Over-fitting in Model Selection and Subsequent Selection Bias in Performance EvaluationThe code and demo use finite toy loss estimates for a degree/lambda grid; they do not claim a universal bias magnitude, an optimal search algorithm, or that nested CV is always required for every operational setting.A bounded review summary is present; still check caveats and exact source scope.Substantive local review after three GPT Pro/Oracle stalls. CS229, ISLR, scikit-learn, and Cawley-Talbot support the model-selection/test-peeking contract. Claude alternate review found no math/source/code/demo blockers after documented desktop/mobile QA; the toy witness caveat remains. Evidence: responses/model-selection-browser-source-qa-20260628.md.
Reviewer: codex-source-audit+claude-alt-review; reviewed 2026-06-28Source support candidates
course-notes 2019CS229 Notes: Regularization and Model SelectionSource for hold-out cross-validation, k-fold cross-validation, and choosing among model classes or hyperparameters using held-out empirical error.
book 2023An Introduction to Statistical LearningSource for validation-set and cross-validation approaches to estimating test error for model assessment and model selection.
documentation 2026scikit-learn User Guide: Tuning the hyper-parameters of an estimatorSource for grid search, randomized search, cross-validated estimators, and the idea that hyperparameters are set before fitting model parameters.
documentation 2026scikit-learn Example: Nested versus non-nested cross-validationSource for the warning that using the same data to tune and evaluate can produce optimistically biased scores.
Practice Loop
Try the idea before it explains itself
Model selection turns many candidate settings into one chosen procedure; dev/CV may choose, while test stays untouched for final evidence.
Before touching the demo, predict one visible change that should happen in Model Selection and Hyperparameter Search.
Reveal when your model needs a nudge.
Reveal when your model needs a nudge.
Reveal when your model needs a nudge.
A concrete answer is on the canvas.
The answer names why the claim should hold.
It touches the page context or a neighboring idea.
Research Room
Attach the question to an exact object
Pick the concept, equation, source, code witness, claim, misconception, or demo state before asking for help. The handoff stays grounded to that object.Open the draft below to save one note and next action in this browser.
Model Selection and Hyperparameter Search
What is the smallest example that makes Model Selection and Hyperparameter Search click without losing the math?
Local action draftNo local draft saved yetExpand only when ready to capture one local next action
This draft stays locally in this browser for concept:machine-learning/model-selection-hyperparameter-search.
- Source ids to inspect: cs229-regularization-model-selection, islr-resampling-model-selection, sklearn-grid-search, sklearn-nested-cross-validation, cawley-talbot-selection-bias
- Definition, prerequisite, and contrast concept links
- The equation or code witness that makes the concept operational
- One demo state that shows the invariant instead of a slogan
- The learner can state the mechanism in their own words
- The learner can name the prerequisite that would repair confusion
- The learner can predict how the mechanism changes under one perturbation
I am working in Continuous Function's research reading room. Object: concept - Model Selection and Hyperparameter Search Object key: concept:machine-learning/model-selection-hyperparameter-search Context: Machine Learning Anchor id: concept/concept-notebook/machine-learning/model-selection-hyperparameter-search Open question: What is the smallest example that makes Model Selection and Hyperparameter Search click without losing the math? Evidence to inspect: - Source ids to inspect: cs229-regularization-model-selection, islr-resampling-model-selection, sklearn-grid-search, sklearn-nested-cross-validation, cawley-talbot-selection-bias - Definition, prerequisite, and contrast concept links - The equation or code witness that makes the concept operational - One demo state that shows the invariant instead of a slogan What would resolve this: - The learner can state the mechanism in their own words - The learner can name the prerequisite that would repair confusion - The learner can predict how the mechanism changes under one perturbation Answer as a careful research tutor: stay source-grounded, separate verified evidence from assumptions, name the relevant math objects, and end with one next action.
concept/concept-notebook/machine-learning/model-selection-hyperparameter-search
concept:machine-learning/model-selection-hyperparameter-search