fix(checks): tighten checkToolErrorSpan to require non-ok span status#141
Open
constantinius wants to merge 1 commit intomainfrom
Open
fix(checks): tighten checkToolErrorSpan to require non-ok span status#141constantinius wants to merge 1 commit intomainfrom
constantinius wants to merge 1 commit intomainfrom
Conversation
Simplify the tool error span check to focus on what matters: the span status field must be set and must not be "ok". Previously the check looked for a grab-bag of error indicators (data.error, data.exception, gen_ai.tool.error, tags.error) in addition to status, which was overly lenient — a span could pass by having any random error-like data key even if the status itself was wrong. The new check requires that span.status is present and is not "ok" (e.g. "internal_error", "error"). This catches frameworks like Python LangGraph where the tool span status is unset despite the tool raising an exception.
🔴 AI SDK Integration Test ResultsStatus: 3 regressions detected Summary
🔴 RegressionsThese tests were passing on main but are now failing: browser/langchain :: Multi-Turn LLM Test (blocking)Error: Browser test timed out (60s) browser/openai :: Multi-Turn LLM Test (blocking)Error: Browser test timed out (60s) python/google-genai :: Conversation ID LLM Test (async, streaming)Error: Test execution timed out (60s) ✅ FixedThese tests were failing on main but are now passing:
Test MatrixAgent Tests
Embedding Tests
LLM Tests
MCP Tests
Legend: ✅ Pass | ❌ Fail | ✅🔧 Fixed | ❌📉 Regressed | ✅🆕 New (pass) | ❌🆕 New (fail) | 🗑️ Removed | str=streaming blk=blocking a=async s=sync io=stdio sse=sse hi=highlevel lo=lowlevel Generated by AI SDK Integration Tests |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Simplify the tool error span check to focus on what matters: the span status field must be set and must not be "ok". Previously the check looked for a grab-bag of error indicators (data.error, data.exception, gen_ai.tool.error, tags.error) in addition to status, which was overly lenient — a span could pass by having any random error-like data key even if the status itself was wrong.
The new check requires that span.status is present and is not "ok" (e.g. "internal_error", "error"). This catches frameworks like Python LangGraph where the tool span status is unset despite the tool raising an exception.
Closes https://linear.app/getsentry/issue/TET-2091/test-mismatch-between-span-status-and-tool-call-results