fix: handle parallel tool calls with same index in streaming mode#1958
Draft
wingding12 wants to merge 1 commit intohuggingface:mainfrom
Draft
fix: handle parallel tool calls with same index in streaming mode#1958wingding12 wants to merge 1 commit intohuggingface:mainfrom
wingding12 wants to merge 1 commit intohuggingface:mainfrom
Conversation
…ggingface#1569) When streaming tool calls, some providers (e.g., Ollama via LiteLLM) return multiple parallel tool calls with the same index. Previously, these would be incorrectly merged into a single tool call with concatenated arguments, causing tool execution failures. This fix modifies agglomerate_stream_deltas() to detect when a stream delta represents a new tool call vs a continuation of an existing one: - If IDs differ between deltas, create a new tool call - If function names differ, create a new tool call - If current arguments form a complete JSON and new arguments arrive, create a new tool call (indicates previous call finished) Additionally, tool calls without IDs now get assigned unique UUIDs. Changes: - models.py: Rewrite agglomerate_stream_deltas() with helper functions _is_complete_json() and _should_create_new_tool_call() to properly separate parallel tool calls with the same index - test_models.py: Add comprehensive unit tests for the new behavior - test_agents.py: Add integration test verifying fix works through ToolCallingAgent streaming pipeline - test_gradio_ui.py: Add integration test verifying fix works through Gradio UI streaming pipeline Fixes huggingface#1569
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1569 - Parallel function calling does not work in streaming mode
When streaming tool calls, some providers (e.g., Ollama via LiteLLM) return multiple parallel tool calls with the same index. Previously, these would be incorrectly merged into a single tool call with concatenated arguments like
{"query": "Emma Bull"}{"query": "Virginia Woolf"}, causing tool execution failures.Root Cause
The
agglomerate_stream_deltas()function used theindexfield as a unique key for tool calls. When multiple parallel tool calls had the same index (as returned by Ollama), they would be merged together instead of being treated as separate calls.Solution
Modified
agglomerate_stream_deltas()to detect when a stream delta represents a new tool call vs a continuation:Added helper functions:
_is_complete_json()- Check if a string is a valid, complete JSON object_should_create_new_tool_call()- Determine if a delta starts a new tool callTool calls without IDs now get assigned unique UUIDs to ensure proper execution.
Files Changed
src/smolagents/models.pyagglomerate_stream_deltas()tests/test_models.pytests/test_agents.pytests/test_gradio_ui.pyTest plan
agglomerate_stream_deltas():test_agglomerate_stream_deltasstill passes