query: raise ClaudeSDKError instead of bare Exception#771
query: raise ClaudeSDKError instead of bare Exception#771eeeasley wants to merge 1 commit intoanthropics:mainfrom
Conversation
All raise Exception(...) calls in _internal/query.py now use ClaudeSDKError so callers can catch SDK errors with a single except clause instead of except Exception. The receive_messages() error path (line 699) is the critical one: when the CLI subprocess exits non-zero, callers expecting to catch ClaudeSDKError missed it entirely.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #771 +/- ##
=======================================
Coverage ? 83.41%
=======================================
Files ? 14
Lines ? 2369
Branches ? 0
=======================================
Hits ? 1976
Misses ? 393
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I've been bitten by this, too. This is an improvement, but there's more to the problem; |
|
@claude review |
| ListToolsRequest, | ||
| ) | ||
|
|
||
| from .._errors import ClaudeSDKError | ||
| from ..types import ( | ||
| PermissionMode, | ||
| PermissionResultAllow, |
There was a problem hiding this comment.
🔴 Line 195 stores a bare Exception(response.get("error", "Unknown error")) in self.pending_control_results[request_id] when a control_response has subtype "error". This exception is later re-raised at line 396 via raise result in _send_control_request(), so callers catching except ClaudeSDKError: will miss it — the exact bug class this PR aims to fix. Since the import is already present, the fix is just Exception( → ClaudeSDKError( on line 195.
Extended reasoning...
The missed conversion
This PR converts 7 raise Exception(...) sites to raise ClaudeSDKError(...), but misses an 8th site at line 195 inside _read_messages(). When a control_response message arrives with subtype == "error", the code stores a bare Exception object:
self.pending_control_results[request_id] = Exception(
response.get("error", "Unknown error")
)How the stored exception is re-raised
In _send_control_request() (around line 396), after the response event fires, the code retrieves the stored result and checks if it is an exception:
result = self.pending_control_results.pop(request_id)
self.pending_control_responses.pop(request_id, None)
if isinstance(result, Exception):
raise resultSince the stored object is a bare Exception (not a ClaudeSDKError), this raise result throws a plain Exception. Any caller catching except ClaudeSDKError: will not catch it — the exception escapes up the stack.
Step-by-step proof
- The SDK sends a control request (e.g.,
initialize,mcp_status,interrupt,set_permission_mode, etc.) via_send_control_request(). - The CLI responds with a
control_responsemessage wheresubtype == "error". _read_messages()routes this to line 195, storingException("some error message")inself.pending_control_results[request_id].- The event is set, unblocking
_send_control_request(). _send_control_request()pops the result, seesisinstance(result, Exception)isTrue, and executesraise result.- A bare
Exceptionpropagates up. Callers usingexcept ClaudeSDKError:(as described in the PR description’s production incident) do not catch it. - The exception crashes the calling code.
Impact
This is arguably the most impactful missed conversion because it is the error path for all control request failures. Every public method that calls _send_control_request() is affected: initialize(), get_mcp_status(), interrupt(), set_permission_mode(), set_model(), rewind_files(), reconnect_mcp_server(), toggle_mcp_server(), and stop_task().
Fix
The ClaudeSDKError import is already present (line 18, added by this PR). The fix is a one-word change on line 195:
self.pending_control_results[request_id] = ClaudeSDKError(
response.get("error", "Unknown error")
)
All
raise Exception(...)in_internal/query.pynow useClaudeSDKErrorso callers can catch SDK errors with a single except clause.The
receive_messages()error path (line 699) is the critical one — when the CLI subprocess exits non-zero and emits anerrormessage, callers expectingexcept ClaudeSDKError:missed it entirely. We hit this in production: our wrapper caught(TooSlowError, ClaudeSDKError)but the bare Exception escaped and crashed the calling nursery.7 sites converted; all were already protocol/process errors that fit the
ClaudeSDKErrorhierarchy.