Mm 66591 #460

nickmisasi · 2025-12-09T14:37:55Z

Summary

Ticket Link

Screenshots

Release Note

Resolves conflicts and updates to new naming conventions: - Changed EnableThinking to ReasoningDisabled convention - Updated WithoutThinking to WithReasoningDisabled - Fixed WithLLMContextDefaultTools signature change - Added response.incomplete handling for OpenAI Responses API - Fixed linter issues in webapp components 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

github-actions · 2025-12-09T14:58:06Z

🤖 LLM Evaluation Results

OpenAI

⚠️ Overall: 17/18 tests passed (94.4%)

Provider	Total	Passed	Failed	Pass Rate
⚠️ OPENAI	18	17	1	94.4%

❌ Failed Evaluations

Show 1 failures

OPENAI

1. TestChannelSummarization/[openai]_channel_summarization_developers_webapp_channel

Score: 0.00
Rubric: mentions claudio and harrison discussing exactly what should be tracked for code coverage
Reason: While it mentions Claudio working on coverage and that he and Harrison discussed snapshot test effects and investigating E2E coverage, it does not state that they discussed exactly what should be tracked for code coverage.

Anthropic

⚠️ Overall: 15/16 tests passed (93.8%)

Provider	Total	Passed	Failed	Pass Rate
⚠️ ANTHROPIC	16	15	1	93.8%

❌ Failed Evaluations

Show 1 failures

ANTHROPIC

1. TestThreadsSummarizeFromExportedData/[anthropic]_thread_summarization_from_eval_timed_dnd.json

Score: 0.00
Rubric: contains the usernames involved as @mentions if referenced
Reason: Most users are @mentioned, but "Yasser" is referenced without an @, so not all referenced usernames are @mentions.

Azure OpenAI

✅ Overall: 22/22 tests passed (100.0%)

Provider	Total	Passed	Failed	Pass Rate
✅ AZURE	22	22	0	100.0%

Mistral

✅ Overall: 18/18 tests passed (100.0%)

Provider	Total	Passed	Failed	Pass Rate
✅ MISTRAL	18	18	0	100.0%

AWS Bedrock

✅ Overall: 17/17 tests passed (100.0%)

Provider	Total	Passed	Failed	Pass Rate
✅ BEDROCK	17	17	0	100.0%

This comment was automatically generated by the eval CI pipeline.

nickmisasi and others added 4 commits November 28, 2025 13:30

Basic functionality for summaries working

e8d819f

Fix unread summary to capture lastviewedat before it gets cleared

1565f93

Working, with citations based on core product

a600f89

Fix summarization of DMs

c8b9e29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mm 66591 #460

Mm 66591 #460

Uh oh!

nickmisasi commented Dec 9, 2025

Uh oh!

github-actions bot commented Dec 9, 2025 •

edited

Loading

OPENAI

ANTHROPIC

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mm 66591 #460

Are you sure you want to change the base?

Mm 66591 #460

Uh oh!

Conversation

nickmisasi commented Dec 9, 2025

Summary

Ticket Link

Screenshots

Release Note

Uh oh!

github-actions bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 LLM Evaluation Results

OpenAI

❌ Failed Evaluations

OPENAI

Anthropic

❌ Failed Evaluations

ANTHROPIC

Azure OpenAI

Mistral

AWS Bedrock

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Dec 9, 2025 •

edited

Loading