Skip to content

Conversation

@nickmisasi
Copy link
Collaborator

Summary

Ticket Link

Screenshots

Release Note


nickmisasi and others added 4 commits November 28, 2025 13:30
Resolves conflicts and updates to new naming conventions:
- Changed EnableThinking to ReasoningDisabled convention
- Updated WithoutThinking to WithReasoningDisabled
- Fixed WithLLMContextDefaultTools signature change
- Added response.incomplete handling for OpenAI Responses API
- Fixed linter issues in webapp components

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@github-actions
Copy link

github-actions bot commented Dec 9, 2025

🤖 LLM Evaluation Results

OpenAI

⚠️ Overall: 17/18 tests passed (94.4%)

Provider Total Passed Failed Pass Rate
⚠️ OPENAI 18 17 1 94.4%

❌ Failed Evaluations

Show 1 failures

OPENAI

1. TestChannelSummarization/[openai]_channel_summarization_developers_webapp_channel

  • Score: 0.00
  • Rubric: mentions claudio and harrison discussing exactly what should be tracked for code coverage
  • Reason: While it mentions Claudio working on coverage and that he and Harrison discussed snapshot test effects and investigating E2E coverage, it does not state that they discussed exactly what should be tracked for code coverage.

Anthropic

⚠️ Overall: 15/16 tests passed (93.8%)

Provider Total Passed Failed Pass Rate
⚠️ ANTHROPIC 16 15 1 93.8%

❌ Failed Evaluations

Show 1 failures

ANTHROPIC

1. TestThreadsSummarizeFromExportedData/[anthropic]_thread_summarization_from_eval_timed_dnd.json

  • Score: 0.00
  • Rubric: contains the usernames involved as @mentions if referenced
  • Reason: Most users are @mentioned, but "Yasser" is referenced without an @, so not all referenced usernames are @mentions.

Azure OpenAI

Overall: 22/22 tests passed (100.0%)

Provider Total Passed Failed Pass Rate
✅ AZURE 22 22 0 100.0%

Mistral

Overall: 18/18 tests passed (100.0%)

Provider Total Passed Failed Pass Rate
✅ MISTRAL 18 18 0 100.0%

AWS Bedrock

Overall: 17/17 tests passed (100.0%)

Provider Total Passed Failed Pass Rate
✅ BEDROCK 17 17 0 100.0%

This comment was automatically generated by the eval CI pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants