Fix connect() crash on malformed CLAUDE_CODE_STREAM_CLOSE_TIMEOUT by shuofengzhang · Pull Request #763 · anthropics/claude-agent-sdk-python

shuofengzhang · 2026-03-27T22:42:06Z

What changed

Added _parse_timeout_ms_from_env() in src/claude_agent_sdk/client.py to defensively parse timeout env values.
Updated ClaudeSDKClient.connect() to use the helper when reading CLAUDE_CODE_STREAM_CLOSE_TIMEOUT.
Added a regression test:
- tests/test_streaming_client.py::TestClaudeSDKClientEdgeCases::test_connect_with_invalid_timeout_env_falls_back_to_default

Why

ClaudeSDKClient.connect() previously used int(os.environ[...]) for timeout parsing.
If the env var is malformed (for example "60s"), connection setup raises ValueError before the SDK can initialize.
This change keeps behavior identical for valid numeric values, while safely falling back to defaults for malformed/non-finite input.

Insight / Why this matters

Root cause: strict integer parsing on an externally provided environment variable.
Why easy to miss: most local/dev setups leave this variable unset or numeric, so malformed inherited env values are rarely tested.
Tradeoff: we now tolerate malformed values instead of hard-failing early. This slightly reduces strictness but improves operational robustness in CI/shell environments.
Impact: avoids startup failures in real automation workflows where environment variables can be inherited, templated, or accidentally misformatted.

Practical gain / Why this matters

Users embedding ClaudeSDKClient in scripts/services get more reliable startup behavior.
A single malformed env value no longer crashes initialization.
The change is narrow and rollback-safe (parsing path only, no protocol/transport changes).

Testing

✅ .venv/bin/pytest -q tests/test_streaming_client.py::TestClaudeSDKClientEdgeCases::test_connect_with_invalid_timeout_env_falls_back_to_default
✅ .venv/bin/pytest -q

Validation evidence:

New regression test sets CLAUDE_CODE_STREAM_CLOSE_TIMEOUT=60s and verifies connect() succeeds and fallback timeout is applied.

Risk analysis:

Low risk. Only timeout env parsing is touched.
No behavior change for valid numeric values.
If maintainers prefer strict failure semantics, this can be reverted by removing the helper and test in a single commit.

km-anthropic · 2026-03-31T19:35:51Z

@claude review

claude · 2026-03-31T19:39:02Z

src/claude_agent_sdk/client.py

+def _parse_timeout_ms_from_env(
+    env_var: str, default_timeout_ms: float = 60000.0
+) -> float:
+    """Parse timeout milliseconds from environment, falling back safely.
+
+    Accepts integer or float-like strings. Invalid or non-finite values return
+    the provided default.
+    """
+    raw = os.environ.get(env_var)
+    if raw is None:
+        return default_timeout_ms
+
+    try:
+        parsed = float(raw)
+    except (TypeError, ValueError):
+        return default_timeout_ms
+
+    if not math.isfinite(parsed):
+        return default_timeout_ms
+
+    return parsed
+
+


🔴 The _parse_timeout_ms_from_env() helper is only applied in ClaudeSDKClient.connect() but the identical vulnerable int() pattern remains in src/claude_agent_sdk/_internal/client.py:103-105 (the query() one-shot path). Setting CLAUDE_CODE_STREAM_CLOSE_TIMEOUT=60s will still crash query() with a ValueError. The fix should use _parse_timeout_ms_from_env in both locations.

Extended reasoning...

What the bug is

This PR introduces a _parse_timeout_ms_from_env() helper in src/claude_agent_sdk/client.py that safely parses the CLAUDE_CODE_STREAM_CLOSE_TIMEOUT environment variable, falling back to a default when the value is malformed (e.g., "60s", "inf", or other non-numeric strings). The helper is correctly applied in ClaudeSDKClient.connect() at line 178. However, the identical vulnerable pattern was not updated in the separate query() one-shot code path at src/claude_agent_sdk/_internal/client.py:103-105.

The specific code path

In src/claude_agent_sdk/_internal/client.py, lines 103-105:

initialize_timeout_ms = int( os.environ.get("CLAUDE_CODE_STREAM_CLOSE_TIMEOUT", "60000") )

This is the code path used when calling the top-level query() function (the one-shot, non-streaming API). The comment on line 102 even says # Match ClaudeSDKClient.connect(), confirming the intent is for these two locations to stay in sync.

Step-by-step proof

A user sets CLAUDE_CODE_STREAM_CLOSE_TIMEOUT=60s in their environment (a plausible mistake, mixing up seconds vs milliseconds notation).

They call query(prompt="Hello") which enters the one-shot code path in _internal/client.py.

At line 103, os.environ.get("CLAUDE_CODE_STREAM_CLOSE_TIMEOUT", "60000") returns "60s".

int("60s") raises ValueError: invalid literal for int() with base 10: 60s.

The SDK crashes before it can even start the query, with no fallback.

Meanwhile, the same environment variable is now handled gracefully in ClaudeSDKClient.connect() thanks to this PR — creating an inconsistency where the streaming path works but the one-shot path crashes.

Impact

This is the exact same class of bug the PR aims to fix. Users who use the query() function (which is the simpler, recommended API for one-off prompts) remain vulnerable to the crash, while only users of the streaming ClaudeSDKClient are protected. The PR description states the goal is to avoid "startup failures in real automation workflows where environment variables can be inherited, templated, or accidentally misformatted" — but that goal is only half-achieved.

How to fix

Import and use the _parse_timeout_ms_from_env helper from client.py in _internal/client.py, replacing the int(os.environ.get(...)) call on lines 103-105 with:

from claude_agent_sdk.client import _parse_timeout_ms_from_env initialize_timeout_ms = _parse_timeout_ms_from_env( "CLAUDE_CODE_STREAM_CLOSE_TIMEOUT" )

Alternatively, move the helper to a shared utility module to avoid circular imports, or duplicate it in _internal/client.py.

claude · 2026-03-31T19:39:02Z

tests/test_streaming_client.py


+    def test_connect_with_invalid_timeout_env_falls_back_to_default(self):
+        """Malformed timeout env values should not break connect()."""
+
+        async def _test():
+            with (
+                patch.dict(
+                    os.environ,
+                    {"CLAUDE_CODE_STREAM_CLOSE_TIMEOUT": "60s"},
+                    clear=False,
+                ),
+                patch(
+                    "claude_agent_sdk._internal.transport.subprocess_cli.SubprocessCLITransport"
+                ) as mock_transport_class,
+            ):
+                mock_transport = create_mock_transport()
+                mock_transport_class.return_value = mock_transport
+
+                client = ClaudeSDKClient()
+                await client.connect()
+
+                assert client._query is not None
+                # Default is 60000ms, and initialize timeout enforces a 60s minimum.
+                assert client._query._initialize_timeout == 60.0
+
+                await client.disconnect()
+
+        anyio.run(_test)


🟡 Nit: The assertion client._query._initialize_timeout == 60.0 does not actually verify that the fallback value is correct. Because the downstream code applies max(..., 60.0), any parsed value <= 60000 (including 0 or -1 from a broken fallback) would produce the same 60.0 result. Consider adding a companion test with a valid high value like "120000" to confirm parsing works, then testing "60s" to show it falls back.

Extended reasoning...

What the issue is

The test test_connect_with_invalid_timeout_env_falls_back_to_default sets CLAUDE_CODE_STREAM_CLOSE_TIMEOUT="60s" and asserts client._query._initialize_timeout == 60.0. The intent is to verify that malformed env values fall back to the default of 60000ms (which becomes 60.0s). However, the assertion cannot distinguish a correct fallback from a broken one.

Why the assertion is vacuous for fallback verification

The production code path is:

initialize_timeout_ms = _parse_timeout_ms_from_env("CLAUDE_CODE_STREAM_CLOSE_TIMEOUT") # default: 60000.0 initialize_timeout = max(initialize_timeout_ms / 1000.0, 60.0)

Walk through the concrete scenarios:

Correct fallback (returns 60000.0): max(60000.0 / 1000.0, 60.0) = max(60.0, 60.0) = 60.0 ✓

Broken fallback returning 0: max(0.0 / 1000.0, 60.0) = max(0.0, 60.0) = 60.0 ✓

Broken fallback returning -1: max(-0.001, 60.0) = 60.0 ✓

Broken fallback returning 30000: max(30.0, 60.0) = 60.0 ✓

All of these pass the assertion. The max(..., 60.0) clamp masks any incorrect fallback value that happens to be <= 60000.

What the test does verify correctly

The test does successfully verify its primary stated goal: "Malformed timeout env values should not break connect()." The fact that connect() completes without raising ValueError is the real regression test, and that part works.

Impact

This is a test quality issue, not a production bug. If someone accidentally changed the default in _parse_timeout_ms_from_env from 60000.0 to 0, the production code would still behave correctly (the max() clamp ensures at least 60s), but the test would give false confidence that the fallback value itself is correct.

Suggested improvement

Add a test with a valid high value to confirm parsing works end-to-end:

Set CLAUDE_CODE_STREAM_CLOSE_TIMEOUT="120000" → assert _initialize_timeout == 120.0 (proves parsing works and the value exceeds the 60s floor)

Then set CLAUDE_CODE_STREAM_CLOSE_TIMEOUT="60s" → assert _initialize_timeout == 60.0 (now this assertion is meaningful by contrast)

Alternatively, test _parse_timeout_ms_from_env directly to verify it returns 60000.0 for invalid input.

shuofengzhang added 2 commits March 27, 2026 22:41

Handle malformed timeout env values in ClaudeSDKClient

faec794

Merge branch 'main' into fix-invalid-timeout-env-parsing

e1678be

claude bot reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix connect() crash on malformed CLAUDE_CODE_STREAM_CLOSE_TIMEOUT#763

Fix connect() crash on malformed CLAUDE_CODE_STREAM_CLOSE_TIMEOUT#763
shuofengzhang wants to merge 2 commits intoanthropics:mainfrom
shuofengzhang:fix-invalid-timeout-env-parsing

shuofengzhang commented Mar 27, 2026

Uh oh!

km-anthropic commented Mar 31, 2026

Uh oh!

claude bot Mar 31, 2026

Uh oh!

claude bot Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shuofengzhang commented Mar 27, 2026

What changed

Why

Insight / Why this matters

Practical gain / Why this matters

Testing

Uh oh!

km-anthropic commented Mar 31, 2026

Uh oh!

claude bot Mar 31, 2026

Choose a reason for hiding this comment

What the bug is

The specific code path

Step-by-step proof

Impact

How to fix

Uh oh!

claude bot Mar 31, 2026

Choose a reason for hiding this comment

What the issue is

Why the assertion is vacuous for fallback verification

What the test does verify correctly

Impact

Suggested improvement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants