diff --git a/CLAUDE.md b/CLAUDE.md
index e31e8456..766c2d80 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -22,6 +22,7 @@ When writing documentation with code examples:
 4. For examples that reference user-created code (like `my_custom_metric.py`), use existing implementations instead (e.g., `MAEMetric` from `chap_core.assessment.metrics.mae`).
 5. Only use `console` blocks as a last resort for pseudo-code, CLI commands, or incomplete code signatures that cannot be made executable.
 6. When showing class/function signatures, prefer a complete minimal example over an incomplete signature snippet.
+7. To render code output in the built docs, use `exec="on" session="<name>" source="above"` on Python code blocks. Add `result="text"` for plain-text output, or omit it when the block prints markdown (e.g. `to_markdown()` tables). Blocks sharing a `session` share state like mktestdocs `memory=True`.
 
 ## Domain Knowledge
 - To learn about domain-specific terms used in the codebase, refer to @docs/contributor/vocabulary.md.
diff --git a/docs/contributor/evaluation_walkthrough.md b/docs/contributor/evaluation_walkthrough.md
index f0f60bb8..aed761fa 100644
--- a/docs/contributor/evaluation_walkthrough.md
+++ b/docs/contributor/evaluation_walkthrough.md
@@ -14,7 +14,7 @@ For the conceptual overview and architecture diagrams, see
 A `DataSet` is the central data structure in CHAP. It maps location names to
 typed time-series arrays. Load one from CSV:
 
-```python
+```python exec="on" session="eval-walkthrough" source="above"
 from chap_core.spatio_temporal_data.temporal_dataclass import DataSet
 
 dataset = DataSet.from_csv("example_data/laos_subset.csv")
@@ -22,7 +22,7 @@ dataset = DataSet.from_csv("example_data/laos_subset.csv")
 
 Inspect locations, time range, and available fields:
 
-```python
+```python exec="on" session="eval-walkthrough" source="above" result="text"
 import dataclasses
 
 print(list(dataset.keys()))
@@ -43,7 +43,7 @@ The `train_test_generator` function implements expanding-window cross-validation
 It returns a training set and an iterator of `(historic, masked_future, future_truth)`
 tuples.
 
-```python
+```python exec="on" session="eval-walkthrough" source="above"
 from chap_core.assessment.dataset_splitting import train_test_generator
 
 train_set, splits = train_test_generator(
@@ -54,7 +54,7 @@ splits = list(splits)
 
 The training set covers the earliest portion of the data:
 
-```python
+```python exec="on" session="eval-walkthrough" source="above" result="text"
 print(train_set.period_range)
 print(len(train_set.period_range))
 ```
@@ -65,7 +65,7 @@ Each split provides three datasets per location:
 - **masked_future_data** -- future covariates *without* `disease_cases`
 - **future_data** -- full future data including `disease_cases` (ground truth)
 
-```python
+```python exec="on" session="eval-walkthrough" source="above" result="text"
 for i, (historic, masked_future, future_truth) in enumerate(splits):
     print(
         f"Split {i}: historic periods={len(historic.period_range)}, "
@@ -78,7 +78,7 @@ for i, (historic, masked_future, future_truth) in enumerate(splits):
 The historic window expands by `stride` periods with each successive split, while
 the future window slides forward:
 
-```python
+```python exec="on" session="eval-walkthrough" source="above" result="text"
 for i, (historic, masked_future, future_truth) in enumerate(splits):
     print(
         f"Split {i}: historic={len(historic.period_range)} periods, "
@@ -89,7 +89,7 @@ for i, (historic, masked_future, future_truth) in enumerate(splits):
 The masked future data has climate features but no `disease_cases`, which is
 exactly what a model receives at prediction time:
 
-```python
+```python exec="on" session="eval-walkthrough" source="above" result="text"
 location = list(splits[0][1].keys())[0]
 masked_fields = [f.name for f in dataclasses.fields(splits[0][1][location])]
 print(masked_fields)
@@ -100,7 +100,7 @@ print(masked_fields)
 Train the `NaiveEstimator` (which predicts Poisson samples around each location's
 historical mean) and predict on one split:
 
-```python
+```python exec="on" session="eval-walkthrough" source="above"
 from chap_core.predictor.naive_estimator import NaiveEstimator
 
 estimator = NaiveEstimator()
@@ -113,7 +113,7 @@ predictions = predictor.predict(historic, masked_future)
 The result is a `DataSet[Samples]` -- each location holds a 2D array of shape
 `(n_periods, n_samples)`:
 
-```python
+```python exec="on" session="eval-walkthrough" source="above" result="text"
 location = list(predictions.keys())[0]
 print(predictions[location].samples.shape)
 ```
@@ -122,7 +122,7 @@ print(predictions[location].samples.shape)
 
 Merge predictions with ground truth using `DataSet.merge`:
 
-```python
+```python exec="on" session="eval-walkthrough" source="above" result="text"
 from chap_core.datatypes import SamplesWithTruth
 import numpy as np
 
@@ -141,7 +141,7 @@ predicted `samples` array, enabling metric computation.
 The `backtest` function ties sections 2-5 together: it splits the data, trains
 the model once, predicts for each split, and merges with ground truth.
 
-```python
+```python exec="on" session="eval-walkthrough" source="above" result="text"
 from chap_core.assessment.prediction_evaluator import backtest
 
 results = list(backtest(estimator, dataset, prediction_length=3, n_test_sets=4, stride=1))
@@ -164,7 +164,7 @@ attributes with the model metadata needed by the evaluation:
 
 Run the evaluation:
 
-```python
+```python exec="on" session="eval-walkthrough" source="above"
 from chap_core.api_types import BackTestParams
 from chap_core.assessment.evaluation import Evaluation
 
@@ -174,23 +174,20 @@ evaluation = Evaluation.create(estimator.configured_model_db, estimator, dataset
 
 Export to flat DataFrames for inspection:
 
-```python
+```python exec="on" session="eval-walkthrough" source="above"
 import pandas as pd
 
 flat = evaluation.to_flat()
 
 forecasts_df = pd.DataFrame(flat.forecasts)
-print(forecasts_df.columns.tolist())
-print(forecasts_df.shape)
-
 observations_df = pd.DataFrame(flat.observations)
-print(observations_df.columns.tolist())
-print(observations_df.shape)
+
+print(forecasts_df.head().to_markdown())
 ```
 
 Export to a NetCDF file for sharing or later analysis:
 
-```python
+```python exec="on" session="eval-walkthrough" source="above" result="text"
 import tempfile
 
 with tempfile.NamedTemporaryFile(suffix=".nc", delete=False) as f:
diff --git a/docs/contributor/writing_building_documentation.md b/docs/contributor/writing_building_documentation.md
index 35b1091d..9cffe465 100644
--- a/docs/contributor/writing_building_documentation.md
+++ b/docs/contributor/writing_building_documentation.md
@@ -87,6 +87,8 @@ make test-docs-all
 
 4. **Avoid inline test data**: Use existing fixtures from `conftest.py` files when possible rather than creating new test data inline.
 
+5. **Render code output with markdown-exec**: To show code output in the built docs, add `exec="on" session="<name>" source="above"` to a Python code block. Blocks sharing the same `session` share state (imports, variables), similar to mktestdocs `memory=True`. Use `result="text"` for plain-text output (wraps in a code block), or omit it when the block prints markdown (e.g. `to_markdown()` tables) so it renders natively. Note: mktestdocs skips `exec="on"` blocks since the language tag is no longer plain `python`.
+
 ### Skipping files from testing
 
 Some documentation files cannot be tested (e.g., they require Docker, external services, or would run destructive commands). To skip a file, add it to `SKIP_FILES` in `tests/test_documentation.py` with a comment explaining why:
diff --git a/mkdocs.yml b/mkdocs.yml
index a0e5fc66..2d7cf707 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -32,6 +32,7 @@ theme:
 
 plugins:
   - search
+  - markdown-exec
   - mkdocstrings:
       handlers:
         python:
diff --git a/pyproject.toml b/pyproject.toml
index 10411dee..8b9d27ef 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -73,7 +73,9 @@ dev = [
     "pytest-asyncio>=0.24.0",
     "pytest-cov>=7.0.0",
     "pytest-mock>=3.15.1",
+    "markdown-exec[ansi]>=1.7",
     "mktestdocs>=0.2.2",
+    "tabulate>=0.9",
     "wheel>=0.45.1",
     "ipython>=9.6.0",
     "mypy>=1.19.1",
diff --git a/uv.lock b/uv.lock
index c984f3ea..917289bc 100644
--- a/uv.lock
+++ b/uv.lock
@@ -379,6 +379,7 @@ dev = [
     { name = "coverage" },
     { name = "fakeredis" },
     { name = "ipython" },
+    { name = "markdown-exec", extra = ["ansi"] },
     { name = "mkdocs" },
     { name = "mkdocs-material" },
     { name = "mkdocstrings", extra = ["python"] },
@@ -392,6 +393,7 @@ dev = [
     { name = "pytest-cov" },
     { name = "pytest-mock" },
     { name = "ruff" },
+    { name = "tabulate" },
     { name = "types-geopandas" },
     { name = "types-jsonschema" },
     { name = "types-psycopg2" },
@@ -460,6 +462,7 @@ dev = [
     { name = "coverage", specifier = ">=7.10.7" },
     { name = "fakeredis", specifier = ">=2.26.0" },
     { name = "ipython", specifier = ">=9.6.0" },
+    { name = "markdown-exec", extras = ["ansi"], specifier = ">=1.7" },
     { name = "mkdocs", specifier = ">=1.6" },
     { name = "mkdocs-material", specifier = ">=9.5" },
     { name = "mkdocstrings", extras = ["python"], specifier = ">=0.24" },
@@ -473,6 +476,7 @@ dev = [
     { name = "pytest-cov", specifier = ">=7.0.0" },
     { name = "pytest-mock", specifier = ">=3.15.1" },
     { name = "ruff", specifier = ">=0.13.3" },
+    { name = "tabulate", specifier = ">=0.9" },
     { name = "types-geopandas", specifier = ">=1.1.2.20260120" },
     { name = "types-jsonschema", specifier = ">=4.26.0.20260202" },
     { name = "types-psycopg2", specifier = ">=2.9.21.20251012" },
@@ -1454,6 +1458,23 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/59/1b/6ef961f543593969d25b2afe57a3564200280528caa9bd1082eecdd7b3bc/markdown-3.10.1-py3-none-any.whl", hash = "sha256:867d788939fe33e4b736426f5b9f651ad0c0ae0ecf89df0ca5d1176c70812fe3", size = 107684, upload-time = "2026-01-21T18:09:27.203Z" },
 ]
 
+[[package]]
+name = "markdown-exec"
+version = "1.12.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "pymdown-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/96/73/1f20927d075c83c0e2bc814d3b8f9bd254d919069f78c5423224b4407944/markdown_exec-1.12.1.tar.gz", hash = "sha256:eee8ba0df99a5400092eeda80212ba3968f3cbbf3a33f86f1cd25161538e6534", size = 78105, upload-time = "2025-11-11T19:25:05.44Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/ea/22/7b684ddb01b423b79eaba9726954bbe559540d510abc7a72a84d8eee1b26/markdown_exec-1.12.1-py3-none-any.whl", hash = "sha256:a645dce411fee297f5b4a4169c245ec51e20061d5b71e225bef006e87f3e465f", size = 38046, upload-time = "2025-11-11T19:25:03.878Z" },
+]
+
+[package.optional-dependencies]
+ansi = [
+    { name = "pygments-ansi-color" },
+]
+
 [[package]]
 name = "markdown-it-py"
 version = "4.0.0"
@@ -2431,6 +2452,18 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/c7/21/705964c7812476f378728bdf590ca4b771ec72385c533964653c68e86bdc/pygments-2.19.2-py3-none-any.whl", hash = "sha256:86540386c03d588bb81d44bc3928634ff26449851e99741617ecb9037ee5ec0b", size = 1225217, upload-time = "2025-06-21T13:39:07.939Z" },
 ]
 
+[[package]]
+name = "pygments-ansi-color"
+version = "0.3.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "pygments" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/50/f9/7f417aaee98a74b4f757f2b72971245181fcf25d824d2e7a190345669eaf/pygments-ansi-color-0.3.0.tar.gz", hash = "sha256:7018954cf5b11d1e734383a1bafab5af613213f246109417fee3f76da26d5431", size = 7317, upload-time = "2023-05-18T22:44:35.792Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/e6/17/8306a0bcd8c88d7761c2e73e831b0be026cd6873ce1f12beb3b4c9a03ffa/pygments_ansi_color-0.3.0-py3-none-any.whl", hash = "sha256:7eb063feaecadad9d4d1fd3474cbfeadf3486b64f760a8f2a00fc25392180aba", size = 10242, upload-time = "2023-05-18T22:44:34.287Z" },
+]
+
 [[package]]
 name = "pymdown-extensions"
 version = "10.20.1"
@@ -3183,6 +3216,15 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/a8/45/a132b9074aa18e799b891b91ad72133c98d8042c70f6240e4c5f9dabee2f/structlog-25.5.0-py3-none-any.whl", hash = "sha256:a8453e9b9e636ec59bd9e79bbd4a72f025981b3ba0f5837aebf48f02f37a7f9f", size = 72510, upload-time = "2025-10-27T08:28:21.535Z" },
 ]
 
+[[package]]
+name = "tabulate"
+version = "0.9.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/ec/fe/802052aecb21e3797b8f7902564ab6ea0d60ff8ca23952079064155d1ae1/tabulate-0.9.0.tar.gz", hash = "sha256:0095b12bf5966de529c0feb1fa08671671b3368eec77d7ef7ab114be2c068b3c", size = 81090, upload-time = "2022-10-06T17:21:48.54Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/40/44/4a5f08c96eb108af5cb50b41f76142f0afa346dfa99d5296fe7202a11854/tabulate-0.9.0-py3-none-any.whl", hash = "sha256:024ca478df22e9340661486f85298cff5f6dcdba14f3813e8830015b9ed1948f", size = 35252, upload-time = "2022-10-06T17:21:44.262Z" },
+]
+
 [[package]]
 name = "tenacity"
 version = "9.1.2"