Skip to content

fix(checks): tighten checkToolErrorSpan to require non-ok span status#141

Open
constantinius wants to merge 1 commit intomainfrom
improve-tool-error-status-check
Open

fix(checks): tighten checkToolErrorSpan to require non-ok span status#141
constantinius wants to merge 1 commit intomainfrom
improve-tool-error-status-check

Conversation

@constantinius
Copy link
Copy Markdown
Collaborator

Simplify the tool error span check to focus on what matters: the span status field must be set and must not be "ok". Previously the check looked for a grab-bag of error indicators (data.error, data.exception, gen_ai.tool.error, tags.error) in addition to status, which was overly lenient — a span could pass by having any random error-like data key even if the status itself was wrong.

The new check requires that span.status is present and is not "ok" (e.g. "internal_error", "error"). This catches frameworks like Python LangGraph where the tool span status is unset despite the tool raising an exception.

Closes https://linear.app/getsentry/issue/TET-2091/test-mismatch-between-span-status-and-tool-call-results

Simplify the tool error span check to focus on what matters: the span
status field must be set and must not be "ok". Previously the check
looked for a grab-bag of error indicators (data.error, data.exception,
gen_ai.tool.error, tags.error) in addition to status, which was overly
lenient — a span could pass by having any random error-like data key
even if the status itself was wrong.

The new check requires that span.status is present and is not "ok"
(e.g. "internal_error", "error"). This catches frameworks like
Python LangGraph where the tool span status is unset despite the tool
raising an exception.
@linear-code
Copy link
Copy Markdown

linear-code bot commented Apr 13, 2026

@github-actions
Copy link
Copy Markdown

🔴 AI SDK Integration Test Results

Status: 3 regressions detected

Summary

Metric main PR Change
Total Tests 667 667
Passed 465 466 +1 ✅
Failed 192 195 +3 ⚠️

🔴 Regressions

These tests were passing on main but are now failing:

browser/langchain :: Multi-Turn LLM Test (blocking)

Error: Browser test timed out (60s)

Browser test timed out (60s)
browser/openai :: Multi-Turn LLM Test (blocking)

Error: Browser test timed out (60s)

Browser test timed out (60s)
python/google-genai :: Conversation ID LLM Test (async, streaming)

Error: Test execution timed out (60s)

Test execution timed out (60s)

✅ Fixed

These tests were failing on main but are now passing:

  • cloudflare/anthropic :: Vision LLM Test (blocking)
  • cloudflare/anthropic :: Long Input LLM Test (blocking)
  • cloudflare/google-genai :: Basic Error LLM Test (blocking)
  • cloudflare/langchain :: Vision LLM Test (blocking)

Test Matrix

Agent Tests

SDK Basic Agent Test Conversation ID Agent Test Long Input Agent Test Tool Call Agent Test Tool Error Agent Test Vision Agent Test
browser/langgraph blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain
cloudflare/langgraph
cloudflare/vercel blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai
nextjs/mastra
nextjs/vercel blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai
node/langgraph
node/manual
node/mastra
node/vercel blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai blk, class, anthropicblk, class, openaiblk, function, anthropicblk, function, openaistr, class, anthropicstr, class, openaistr, function, anthropicstr, function, openai
python/langgraph as as as as as as
python/manual as as as as as as
python/openai-agents
python/pydantic-ai a, fallbacka, single a, fallbacka, single a, fallbacka, single a, fallbacka, single a, fallbacka, single a, fallbacka, single

Embedding Tests

SDK Basic Embeddings Test
browser/google-genai
browser/langchain
browser/openai
cloudflare/google-genai
cloudflare/langchain
cloudflare/openai
cloudflare/vercel
nextjs/google-genai
nextjs/langchain
nextjs/openai
nextjs/vercel
node/google-genai
node/langchain
node/openai
node/vercel
python/google-genai a, blks, blk
python/langchain a, blks, blk
python/litellm a, blks, blk
python/manual a, blks, blk
python/openai a, blks, blk

LLM Tests

SDK Basic Error LLM Test Basic LLM Test Conversation ID LLM Test Long Input LLM Test Multi-Turn LLM Test Vision LLM Test
browser/anthropic blkstr blkstr blkstr blkstr blkstr blkstr
browser/google-genai blkstr blkstr blkstr blkstr blkstr blkstr
browser/langchain blkstr blkstr blkstr blkstr ❌📉blkstr blkstr
browser/openai blkstr blkstr blkstr blkstr ❌📉blkstr blkstr
cloudflare/anthropic blkstr blkstr blkstr ✅🔧blkstr blkstr ✅🔧blkstr
cloudflare/google-genai ✅🔧blkstr blkstr blkstr blkstr blkstr blkstr
cloudflare/langchain blkstr blkstr blkstr blkstr blkstr ✅🔧blkstr
cloudflare/openai blkstr blkstr blkstr blkstr blkstr blkstr
nextjs/anthropic blkstr blkstr blkstr blkstr blkstr blkstr
nextjs/google-genai blkstr blkstr blkstr blkstr blkstr blkstr
nextjs/langchain blkstr blkstr blkstr blkstr blkstr blkstr
nextjs/openai blkstr blkstr blkstr blkstr blkstr blkstr
node/anthropic blkstr blkstr blkstr blkstr blkstr blkstr
node/google-genai blkstr blkstr blkstr blkstr blkstr blkstr
node/langchain blkstr blkstr blkstr blkstr blkstr blkstr
node/manual
node/openai blkstr blkstr blkstr blkstr blkstr blkstr
python/anthropic a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str
python/google-genai a, blka, strs, blks, str a, blka, strs, blks, str a, blk ❌📉a, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str
python/langchain a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str
python/litellm a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str
python/manual a, blks, blk a, blks, blk a, blks, blk a, blks, blk a, blks, blk
python/openai a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str

MCP Tests

SDK Basic MCP Tool Call Test MCP Multiple Tool Calls Test MCP Prompt Get Test MCP Resource Read Test MCP Tool Error Test
node/mcp sseio sseio sseio sseio sseio
python/fastmcp a, blk, ssea, blk, io a, blk, ssea, blk, io a, blk, ssea, blk, io a, blk, ssea, blk, io a, blk, ssea, blk, io
python/mcp a, blk, sse, hia, blk, sse, loa, blk, io, hia, blk, io, lo a, blk, sse, hia, blk, sse, loa, blk, io, hia, blk, io, lo a, blk, sse, hia, blk, sse, loa, blk, io, hia, blk, io, lo a, blk, sse, hia, blk, sse, loa, blk, io, hia, blk, io, lo a, blk, sse, hia, blk, sse, loa, blk, io, hia, blk, io, lo

Legend: ✅ Pass | ❌ Fail | ✅🔧 Fixed | ❌📉 Regressed | ✅🆕 New (pass) | ❌🆕 New (fail) | 🗑️ Removed | str=streaming blk=blocking a=async s=sync io=stdio sse=sse hi=highlevel lo=lowlevel


Generated by AI SDK Integration Tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant