Skip to content

fix: capture stderr in ProcessError for better debugging (#641)#658

Open
MaxwellCalkin wants to merge 1 commit intoanthropics:mainfrom
MaxwellCalkin:fix/capture-stderr-for-process-error
Open

fix: capture stderr in ProcessError for better debugging (#641)#658
MaxwellCalkin wants to merge 1 commit intoanthropics:mainfrom
MaxwellCalkin:fix/capture-stderr-for-process-error

Conversation

@MaxwellCalkin
Copy link
Copy Markdown

Summary

Fixes #641ProcessError now includes the actual CLI stderr output instead of a generic placeholder message.

Problem: When the CLI subprocess exits with a non-zero exit code, ProcessError was raised with stderr="Check stderr output for details" — a hardcoded string that provides no useful debugging information. Worse, stderr was only piped at all when the user explicitly provided a stderr callback or enabled debug mode, so in the common case the real error output was lost entirely.

Root cause in subprocess_cli.py:

  1. stderr was conditionally piped (only with callback or debug mode)
  2. When piped, stderr was consumed by _handle_stderr() for callbacks but never retained
  3. ProcessError always used a hardcoded placeholder string

Fix:

  • Always pipe stderr from the subprocess so error output is available regardless of user configuration
  • Buffer all stderr lines in self._stderr_buffer (in addition to invoking existing user callbacks / debug output)
  • Use the captured buffer when constructing ProcessError, so callers see the real CLI error
  • Reset the buffer in close() to prevent memory leaks

Before

ProcessError: Command failed with exit code 1 (exit code: 1)
Error output: Check stderr output for details

After

ProcessError: Command failed with exit code 1 (exit code: 1)
Error output: Error: No conversation found with session ID ab2c985b

Test plan

  • ruff check passes — no lint issues
  • mypy passes — no type errors
  • Existing test suite passes (no behavioral changes to happy path)
  • Manual test: trigger a CLI error (e.g., resume non-existent session) and verify ProcessError.stderr contains the real error message

🤖 Generated with Claude Code

)

Previously, stderr was only piped when the user provided a callback or
enabled debug mode. When the CLI process exited with an error,
ProcessError was raised with the generic message "Check stderr output
for details" instead of the actual error text.

This change:
- Always pipes stderr from the subprocess
- Buffers all stderr lines in _stderr_buffer alongside existing callbacks
- Uses the captured stderr content in ProcessError when the process fails
- Resets the buffer on close() to prevent memory leaks

Fixes anthropics#641

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@c3-rambandi
Copy link
Copy Markdown

amazing, let's merge this asap

@b657bm7n98-hub
Copy link
Copy Markdown

Is this the same as #529?

@jmehnle
Copy link
Copy Markdown

jmehnle commented Mar 30, 2026

#641 was (inappropriately) closed, but this PR hasn't even been merged, let alone released. What's blocking this from being merged?

@km-anthropic
Copy link
Copy Markdown
Collaborator

@claude review

Comment on lines 586 to +593

# Use exit code for error detection
if returncode is not None and returncode != 0:
captured_stderr = "\n".join(self._stderr_buffer) if self._stderr_buffer else None
self._exit_error = ProcessError(
f"Command failed with exit code {returncode}",
exit_code=returncode,
stderr="Check stderr output for details",
stderr=captured_stderr,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Race condition: after await self._process.wait() returns (line 583), self._stderr_buffer is read immediately (line 589) without waiting for _handle_stderr to finish consuming remaining pipe data. The stderr task group is only awaited in close(), which runs after the ProcessError is already raised with potentially incomplete stderr. Consider awaiting/cancelling the stderr task group between process.wait() and reading the buffer.

Extended reasoning...

Bug Analysis

_handle_stderr runs as a separate task inside self._stderr_task_group, started at line 393 during connect(). In _read_messages_impl, after the stdout stream is exhausted, the code awaits self._process.wait() at line 583, then immediately reads self._stderr_buffer at line 589 to construct a ProcessError. There is no synchronization point between these two operations to ensure _handle_stderr has finished.

Code Path

When the subprocess exits, the write end of the stderr pipe is closed. However, there may still be unread data in the kernel pipe buffer. _handle_stderr needs at least one more scheduling turn (an await yield point) to read these final bytes via async for line in self._stderr_stream and append them to self._stderr_buffer. But after process.wait() returns, the code proceeds synchronously — if returncode is not None and returncode != 0: then "\n".join(self._stderr_buffer) — with no yield point, giving the stderr task no opportunity to run.

The _stderr_task_group is cancelled and awaited in close() (lines 460-464), but close() runs after the ProcessError has already been raised with the potentially incomplete buffer contents.

Step-by-step proof

  1. CLI process writes an error message to stderr (e.g., "Error: No conversation found with session ID ab2c985b") and exits with code 1.
  2. The stdout async iterator in _read_messages_impl finishes (no more stdout data).
  3. await self._process.wait() returns with returncode=1.
  4. The event loop does NOT schedule _handle_stderr between wait() returning and the next line executing.
  5. self._stderr_buffer is read — it may be missing the final stderr lines that are still sitting in the pipe buffer.
  6. ProcessError is raised with incomplete stderr.
  7. Only later, in close(), is the stderr task group properly awaited.

Impact

In practice, the window for this race is small: _handle_stderr runs concurrently during the entire stdout-reading phase with many yield points, so for typical short error messages it will have already consumed them. The most likely scenario for data loss is when the process writes to stderr and exits immediately with little or no stdout output, creating a tight race. Since the entire purpose of this PR is to capture stderr for better error reporting, it would be worth closing this gap.

Suggested Fix

After await self._process.wait() and before reading self._stderr_buffer, cancel and await the stderr task group:

if self._stderr_task_group:
    self._stderr_task_group.cancel_scope.cancel()
    await self._stderr_task_group.__aexit__(None, None, None)
    self._stderr_task_group = None

This ensures all buffered pipe data has been consumed before constructing the ProcessError.

@@ -429,6 +425,9 @@ async def _handle_stderr(self) -> None:
if not line_str:
continue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: _stderr_buffer grows without bound since every stderr line is appended (line 427) but no size cap is applied, unlike the stdout JSON buffer which is guarded by _max_buffer_size (line 555). Since stderr is now always piped, consider capping the buffer (e.g., keep only the last N lines or apply _max_buffer_size) as a defensive measure.

Extended reasoning...

What the bug is

The new self._stderr_buffer list (initialized at line 59) appends every non-empty stderr line unconditionally at line 427, but there is no size limit enforced on this buffer. The existing _max_buffer_size field is only checked against the stdout JSON buffer (line 555), not the stderr buffer. The buffer is only cleared when close() is called (line 487).

Why the surface area expanded

Before this PR, stderr was only piped when a user-provided callback existed or debug mode was enabled (should_pipe_stderr = self._options.stderr is not None or "debug-to-stderr" in self._options.extra_args). Now stderr is always piped (stderr_dest = PIPE), meaning _handle_stderr() always runs and always appends to the buffer regardless of user configuration.

Step-by-step proof

  1. User creates a SubprocessCLITransport and calls connect().
  2. Since stderr_dest = PIPE (always), _handle_stderr() is started as an async task.
  3. For each non-empty line read from stderr, self._stderr_buffer.append(line_str) executes at line 427.
  4. There is no check like if len(self._stderr_buffer) > some_limit — every line is retained.
  5. The buffer is only reset in close() at line 487 (self._stderr_buffer = []).
  6. If the subprocess produces a large volume of stderr output before close() is called, the list grows without bound.

Practical impact

The practical risk is low. The subprocess is the Claude CLI, which produces limited stderr output proportional to conversation activity. Each transport instance is scoped to a single process_query() call, and close() is called when done. Even in debug mode, output is bounded by conversation length. This makes memory exhaustion unlikely in practice.

Recommended fix

Apply the same _max_buffer_size concept to the stderr buffer, or keep only the last N lines (since only the tail of stderr is typically useful for error diagnostics). For example, track the total byte length of the buffer and drop older entries when the limit is exceeded. This would be consistent with how the stdout JSON buffer is already protected and would add a simple defensive guard against any edge cases.

Comment on lines 369 to +372

# Pipe stderr if we have a callback OR debug mode is enabled
should_pipe_stderr = (
self._options.stderr is not None
or "debug-to-stderr" in self._options.extra_args
)

# For backward compat: use debug_stderr file object if no callback and debug is on
stderr_dest = PIPE if should_pipe_stderr else None
# Always pipe stderr so we can capture it for error reporting.
# User callbacks and debug mode are still honored in _handle_stderr().
stderr_dest = PIPE
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Behavioral regression: stderr output during successful CLI operations is now silently swallowed. Previously, when no stderr callback or debug mode was configured, stderr_dest was None so the subprocess inherited the parent stderr FD and CLI warnings/deprecation notices appeared in the terminal. Now stderr is always piped but _handle_stderr has no fallback — lines are buffered but never displayed, and the buffer is cleared in close(). Consider adding an else: sys.stderr.write(line_str + "\n") fallback in _handle_stderr when no callback/debug is configured.

Extended reasoning...

What the bug is

Before this PR, stderr_dest was conditionally set: it was PIPE only when a stderr callback was provided or debug mode was enabled, and None otherwise. When None, the subprocess inherited the parent process's stderr file descriptor, meaning any CLI warnings, deprecation notices, or diagnostic messages written to stderr would appear directly in the user's terminal.

After this PR, stderr_dest is unconditionally set to PIPE (line 372), and all stderr output flows through _handle_stderr(). However, _handle_stderr only has two branches for output: (1) invoke the user's stderr callback, or (2) write to debug_stderr when debug mode is active. There is no fallback else branch for the case where neither is configured.

Code path that triggers the issue

Consider a user who creates a ClaudeAgent with default options — no stderr callback, no debug-to-stderr in extra_args. The subprocess starts with stderr=PIPE. The _handle_stderr coroutine reads each line from stderr, appends it to self._stderr_buffer (line 429), then checks:

  1. if self._options.stderr: — False, no callback set
  2. elif "debug-to-stderr" in self._options.extra_args and self._options.debug_stderr: — False, no debug mode

No output is produced. The line sits in _stderr_buffer and is never displayed.

Step-by-step proof with a concrete example

  1. User creates a ClaudeAgent with default options (no stderr callback, no debug mode)
  2. The CLI subprocess is started with stderr=PIPE
  3. During execution, the CLI writes a deprecation warning to stderr, e.g., "Warning: --verbose flag is deprecated, use --debug instead"
  4. _handle_stderr reads the line, appends it to _stderr_buffer
  5. Neither the callback branch nor the debug branch executes — the warning is silently buffered
  6. The CLI completes successfully (exit code 0)
  7. _read_messages_impl finishes without raising ProcessError (since exit code is 0)
  8. close() is called, which resets self._stderr_buffer = [] (line 487)
  9. The deprecation warning is permanently lost — it was never displayed and the buffer is now empty

Impact

Users who previously relied on seeing CLI stderr output in their terminal during normal (successful) operations will no longer see it. This includes warnings, deprecation notices, and any other diagnostic output the CLI emits to stderr. The stderr buffer is only used when constructing a ProcessError on non-zero exit, so all stderr from successful runs is silently discarded.

How to fix

Add a fallback else branch in _handle_stderr that writes to sys.stderr when no callback or debug mode is configured:

if self._options.stderr:
    self._options.stderr(line_str)
elif "debug-to-stderr" in self._options.extra_args and self._options.debug_stderr:
    self._options.debug_stderr.write(line_str + "\n")
    if hasattr(self._options.debug_stderr, "flush"):
        self._options.debug_stderr.flush()
else:
    sys.stderr.write(line_str + "\n")
    sys.stderr.flush()

This preserves the new stderr-capture behavior for error reporting while maintaining backward compatibility for terminal visibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ProcessError loses real CLI error message during initialize()

5 participants