fix(mcp): forward maxResultSizeChars via _meta to bypass Zod annotation stripping#756
Conversation
…tion fix
Two-layer spill in the bundled CLI (mcpValidation.ts + toolResultStorage.ts)
causes MCP tool results >50K chars to be saved to a temp file with only a
2KB preview sent to Claude, silently degrading quality for data-agent use
cases (DB schemas, DDLs, data profiles).
Root cause (confirmed via claude-cli-internal, 2026-03-27):
Layer 1 — MAX_MCP_OUTPUT_TOKENS controls mcpValidation.ts (~25K token
default). Raising this env var works and the SDK passes it through
correctly.
Layer 2 — toolResultStorage.ts maybePersistLargeToolResult has a
hardcoded 50K char clamp (toolLimits.ts DEFAULT_MAX_RESULT_SIZE_CHARS).
No env var exists to raise it; MCPTool declares 100K but
Math.min(100K, 50K) = 50K makes the declaration moot.
The chosen fix (Option 3) requires a companion change in claude-cli-internal:
MCPTool.ts reads annotations.maxResultSizeChars from the tools/list MCP
response and passes it as declaredMaxResultSizeChars, bypassing the
Math.min clamp when explicitly set. No SDK code changes are needed —
ToolAnnotations already has extra="allow" and the annotation pipeline
already serialises arbitrary extra fields into the JSONRPC payload.
This commit adds:
- tests/test_mcp_large_output.py: full regression documentation covering
both spill layers, env-var passthrough correctness, and message-parser
behaviour for inline vs <persisted-output> content
- test_sdk_mcp_integration.py: test_max_result_size_chars_annotation_flows_to_cli
confirms maxResultSizeChars set via ToolAnnotations(maxResultSizeChars=N)
appears in the JSONRPC tools/list payload the CLI receives — locking in
the SDK half of the Option 3 fix
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #756 +/- ##
=======================================
Coverage ? 84.17%
=======================================
Files ? 14
Lines ? 2470
Branches ? 0
=======================================
Hits ? 2079
Misses ? 391
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…on stripping The MCP TypeScript SDK's ToolAnnotations Zod schema uses z.core.$strip, which silently discards unknown fields (including maxResultSizeChars) at parse time. The companion CLI change (claude-cli-internal#24940) reads the value from _meta['anthropic/maxResultSizeChars'] instead, following the same pattern as anthropic/searchHint and anthropic/alwaysLoad. Changes: - create_sdk_mcp_server: extract maxResultSizeChars from ToolAnnotations and populate Tool._meta['anthropic/maxResultSizeChars'] when present - query.py _handle_sdk_mcp_request tools/list: include _meta in the JSONRPC response payload so the CLI subprocess receives it Users set the threshold via ToolAnnotations (no API change): @tool("get_schema", "Returns DB schema", {}, annotations=ToolAnnotations(maxResultSizeChars=500_000)) async def get_schema(args): ... Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The MCP SDK's Zod schema strips unknown annotation fields, so the value is forwarded via _meta['anthropic/maxResultSizeChars'] instead. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@claude review |
There was a problem hiding this comment.
LGTM — small, well-scoped fix forwarding maxResultSizeChars via _meta with good test coverage.
Extended reasoning...
Overview
This PR makes two small production changes: (1) a _build_meta() helper in __init__.py that extracts maxResultSizeChars from ToolAnnotations and places it under _meta["anthropic/maxResultSizeChars"], and (2) two lines in query.py to include _meta in the JSONRPC tools/list response. Two test files are added/modified to cover the new behavior.
Security risks
None. The change forwards an existing annotation value through an established metadata channel (_meta with namespaced keys). No auth, crypto, or permission logic is affected.
Level of scrutiny
Low — the production diff is under 20 lines across two files, follows the existing _meta pattern used by anthropic/searchHint and anthropic/alwaysLoad, and is well-covered by tests. The switch from Tool(...) to Tool.model_validate({...}) is mechanical and necessary to pass _meta through Pydantic.
Other factors
Codecov reports all modified lines are covered. The two bug reports are both nits about the same dead-code variable in a single test (test_persisted_output_is_not_full_content) — low impact, test-only. No outstanding reviewer comments to address.
| msg = parse_message(_user_message_with_tool_result(PERSISTED_CONTENT)) | ||
| assert isinstance(msg, UserMessage) | ||
| blocks = [b for b in msg.content if isinstance(b, ToolResultBlock)] |
There was a problem hiding this comment.
🟡 In test_persisted_output_is_not_full_content, the large_content variable is dead code — it is defined but never used as input to any function. The assertion blocks[0].content != large_content is trivially true because it compares two completely unrelated hardcoded strings, so the test does not actually verify what its docstring claims ("Claude receives only the 2 KB preview, not the original large content").
Extended reasoning...
What the bug is
In test_persisted_output_is_not_full_content (line 288), the variable large_content is assigned "THE ANSWER IS: 42\n" + "padding\n" * 5000 but is never used as input to any operation. The test then parses PERSISTED_CONTENT (a module-level constant simulating CLI spill output) and asserts blocks[0].content != large_content.
Why the assertion is trivially true
PERSISTED_CONTENT is a hardcoded string starting with <persisted-output>\n... containing a 2KB preview of "x" * 2000. large_content is "THE ANSWER IS: 42\n" + "padding\n" * 5000. These two strings share no common prefix and will never be equal regardless of any code behavior. The != assertion will always pass.
Step-by-step proof
large_content = "THE ANSWER IS: 42\n" + "padding\n" * 5000— a ~40KB string starting with "THE ANSWER IS: 42"msg = parse_message(_user_message_with_tool_result(PERSISTED_CONTENT))— parses the hardcodedPERSISTED_CONTENT, notlarge_contentblocks[0].contentwill equalPERSISTED_CONTENT, which starts with<persisted-output>blocks[0].content != large_contentcomparesPERSISTED_CONTENTtolarge_content— triviallyTrue- The test passes but proves nothing about the system under test
What the test should verify
The docstring says "Claude receives only the 2 KB preview, not the original large content." To meaningfully test this, large_content should be used as the input to _user_message_with_tool_result(), and then the assertion should verify the output differs from the input (or check that the output is the truncated preview). As written, large_content is dead code — a linter like ruff would flag it as F841.
Impact
This is a test-only issue with no production impact. The test provides a false sense of coverage for the persisted-output behavior. If the message parser were changed to, say, strip <persisted-output> tags or transform content in an unexpected way, this test would not catch the regression it claims to guard against.
| msg = parse_message(_user_message_with_tool_result(PERSISTED_CONTENT)) | ||
| assert isinstance(msg, UserMessage) | ||
| blocks = [b for b in msg.content if isinstance(b, ToolResultBlock)] |
There was a problem hiding this comment.
🟡 In test_persisted_output_is_not_full_content, large_content is assigned but never used as input to the test. The assertion blocks[0].content != large_content is trivially true since it compares two completely unrelated hardcoded strings (PERSISTED_CONTENT vs large_content). Remove the dead variable or feed large_content through the test fixture so the assertion is meaningful.
Extended reasoning...
The test test_persisted_output_is_not_full_content (line 288) defines large_content = "THE ANSWER IS: 42\n" + "padding\n" * 5000 but then parses PERSISTED_CONTENT (a module-level constant) via _user_message_with_tool_result(PERSISTED_CONTENT). The assertion on line 294 checks blocks[0].content != large_content, which compares the parsed PERSISTED_CONTENT string against the locally-defined large_content string.
These two strings are completely unrelated — PERSISTED_CONTENT is a <persisted-output> XML wrapper with a 2KB preview, while large_content is "THE ANSWER IS: 42\n" followed by 5000 lines of "padding\n". The != assertion will always pass regardless of any code changes, making this a no-op test.
The test docstring claims to verify "Claude receives only the 2 KB preview, not the original large content," but the test never connects large_content to the simulated tool result. For the test to be meaningful, large_content would need to be the original content that gets spilled, and the test would need to verify that the parsed result differs from that original. As written, a linter (ruff F841) would flag large_content as an unused local variable.
The impact is low since this is a test-only issue with no production consequences, but the test provides false confidence — it appears to verify spill behavior but actually verifies nothing.
To fix: either remove the dead large_content variable and rewrite the assertion to check something meaningful (e.g., that the content length is approximately 2KB, matching the preview size), or restructure the test so large_content is actually used as the pre-spill input.
…ssertion The test_persisted_output_is_not_full_content test defined large_content but never used it as input — the != assertion was trivially true. Now verifies the parsed preview is under the layer-2 threshold.
Summary
_meta["anthropic/maxResultSizeChars"]fromToolAnnotations(maxResultSizeChars=N)increate_sdk_mcp_server_metain the JSONRPCtools/listresponse so the CLI subprocess receives it_metapipelineBackground
Large MCP tool results (>50K chars — DB schemas, DDLs, data profiles) are silently saved to a temp file with only a 2KB preview sent to Claude. This blocks external customers from upgrading past SDK v0.1.17 (Dec 2025).
Root cause — two independent spill layers in the bundled CLI:
mcpValidation.tsMAX_MCP_OUTPUT_TOKENSenv var ✓toolResultStorage.tsMath.min(MCPTool=100K, DEFAULT=50K)The MCP TypeScript SDK's
ToolAnnotationsZod schema usesz.core.$strip, which silently discards unknown fields at parse time. SomaxResultSizeCharscannot be passed viaannotations. Instead, we use_metawith a namespaced key — the same pattern already used byanthropic/searchHintandanthropic/alwaysLoadin the CLI.Changes
src/claude_agent_sdk/__init__.py—create_sdk_mcp_serverextractsmaxResultSizeCharsfromToolAnnotationsand populatesTool._meta["anthropic/maxResultSizeChars"]src/claude_agent_sdk/_internal/query.py—_handle_sdk_mcp_requestincludes_metain the JSONRPCtools/listresponse payloadtests/test_mcp_large_output.py— 18 tests documenting both spill layers, env-var passthrough, layer-2 boundary, and<persisted-output>detectiontests/test_sdk_mcp_integration.py—test_max_result_size_chars_annotation_flows_to_cliconfirms_meta["anthropic/maxResultSizeChars"]appears in thetools/listresponseUsage (no API change)
Companion CLI PR
https://github.com/anthropics/claude-cli-internal/pull/24940 — reads
_meta["anthropic/maxResultSizeChars"]inclient.tsand removes theMath.minclamp ingetPersistenceThresholdTest plan
tests/test_mcp_large_output.py— 18 tests passingtest_max_result_size_chars_annotation_flows_to_cli— confirms_metapipeline end-to-endtest_tool_annotations_in_jsonrpcstill passes (standard annotations unaffected)🤖 Generated with Claude Code