DRAFT: fix: separate static system prompt from dynamic context for cross-conversation caching#1890
DRAFT: fix: separate static system prompt from dynamic context for cross-conversation caching#1890
Conversation
…versation caching This change enables cross-conversation prompt caching by separating the static system prompt from per-conversation dynamic context (hosts, repos, users, etc.). Changes: - Add static_system_message and dynamic_context properties to AgentBase - Add dynamic_context field to SystemPromptEvent - Add to_llm_messages() method that returns separate system and user messages - Update events_to_messages() to use to_llm_messages() for SystemPromptEvent - Add regression test to ensure static system message stays constant With this change, the static system prompt can be cached and reused across conversations, while dynamic context is sent as a separate user message. Fixes #1808 Co-authored-by: openhands <[email protected]>
…sage Mark system_message as deprecated in 1.11.0 with removal in 1.13.0. Update all usages in examples and tests to use static_system_message instead. Co-authored-by: openhands <[email protected]>
Co-authored-by: openhands <[email protected]>
Co-authored-by: openhands <[email protected]>
Coverage Report •
|
|||||||||||||||||||||||||||||||||||||||||||||
…ve users When SystemPromptEvent has dynamic_context, instead of emitting it as a separate user message (which creates consecutive user messages when the actual user sends 'Hi'), we now merge the dynamic context into the first subsequent user message. This fixes the issue where entering a message in the UI would result in nothing happening because the LLM received [system, user, user] which some providers handle poorly. The message sequence is now: - [system] static prompt (cacheable) - [user] dynamic_context + user's actual message This preserves prompt caching while maintaining proper message alternation. Co-authored-by: openhands <[email protected]>
enyst
left a comment
There was a problem hiding this comment.
Great catch! Yes... compared to v0, we now include the suffix into the usual system prompt. 🤔 That works fine with prompt caching within the same conversation, but not cross-conversation.
Thank you for taking this up!
|
@OpenHands fix the failing CI, and then run the first example from the example directory and see if it completes as expected. |
|
I'm on it! neubig can track my progress at all-hands.dev |
Co-authored-by: openhands <[email protected]>
SummaryI completed both tasks requested: 1. Fixed the Failing CI (Pre-commit checks)The pre-commit checks were failing due to line length violations (E501) in
The fix has been committed and pushed to the 2. Ran the First ExampleI ran
I cleaned up the generated PR Status: The changes have been pushed to the branch and should now pass the pre-commit CI checks. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Review Summary
The core approach of separating static system prompt from dynamic context is solid and will enable cross-conversation caching. However, there are some inconsistencies between the PR description and implementation, plus API design issues that should be addressed.
Key Concerns:
- The
to_llm_messages()method is defined but never called, creating confusion - PR description claims
events_to_messages()usesto_llm_messages(), but it actually usesto_llm_message()(singular) - Some edge cases could use better test coverage
See inline comments for specific issues and suggestions.
| """ | ||
| messages = [Message(role="system", content=[self.system_prompt])] | ||
| if self.dynamic_context: | ||
| messages.append(Message(role="user", content=[self.dynamic_context])) |
There was a problem hiding this comment.
🟠 Important: This method is defined but never called in the codebase. The PR description says "Update events_to_messages() to use to_llm_messages()", but looking at event/base.py:133, it actually calls to_llm_message() (singular) instead.
This creates API confusion:
- Is
to_llm_messages()intended for direct API usage? - Should
events_to_messages()be updated to use this method? - If this method returns
[system, user]messages whendynamic_contextis present, wouldn't that create consecutive user messages (the exact problem you're solving)?
Recommendation: Either:
- Remove this method if it's not needed
- Update
events_to_messages()to actually use it (if possible without breaking the merge logic) - Add a clear docstring explaining why this exists but isn't used by
events_to_messages()
| # Create combined message for the response | ||
| messages.append(_combine_action_events(batch_events)) | ||
| i = j | ||
| elif isinstance(event, SystemPromptEvent): |
There was a problem hiding this comment.
🟡 Suggestion: This calls event.to_llm_message() (singular), but SystemPromptEvent now has a to_llm_messages() (plural) method. The PR description says you "Update events_to_messages() to use to_llm_messages()", but that's not what's happening here.
The current implementation is actually correct (it manually handles the dynamic context merging), but the mismatch with the PR description is confusing. Consider clarifying the intent.
| pending_dynamic_context: TextContent | None = None | ||
| i = 0 | ||
|
|
||
| while i < len(events): | ||
| event = events[i] |
There was a problem hiding this comment.
🟢 Nit: The pending_dynamic_context variable and its deferral logic could benefit from a comment block explaining the strategy:
| pending_dynamic_context: TextContent | None = None | |
| i = 0 | |
| while i < len(events): | |
| event = events[i] | |
| messages: list[Message] = [] | |
| # Track dynamic context from SystemPromptEvent to merge with the next user message. | |
| # This prevents consecutive user messages while enabling cross-conversation caching. | |
| pending_dynamic_context: TextContent | None = None |
This helps future maintainers understand the non-obvious logic.
| i += 1 | ||
|
|
||
| # If there's pending dynamic context but no user message followed, | ||
| # add it as a separate user message (edge case) |
There was a problem hiding this comment.
🟡 Suggestion: This edge case handling is good, but could lead to unusual message sequences if the first event after SystemPromptEvent is an assistant message (you'd get [system, assistant, user]).
While this is probably rare in practice, consider either:
- Adding a test case for this scenario
- Adding a comment explaining the expected conversation flow (SystemPromptEvent should always be followed by a user message in normal usage)
|
|
||
| # Dynamic context should be present somewhere in the messages | ||
| assert "Working directory" in full_text, "Dynamic context should be preserved" | ||
| assert "Date: 2024-01-15" in full_text, "Dynamic context should be preserved" |
There was a problem hiding this comment.
🟡 Suggestion: The test coverage is good for the happy path, but consider adding tests for edge cases:
- What happens if
SystemPromptEventappears in the middle of a conversation (after other events)? - What if there are multiple
SystemPromptEvents in the event stream? - What if
SystemPromptEventhas dynamic context but is followed by an assistant message (not a user message)?
These edge cases would help verify the robustness of the pending_dynamic_context logic.
| return render_template( | ||
| prompt_dir=self.prompt_dir, | ||
| template_name=self.system_prompt_filename, | ||
| **template_kwargs, | ||
| ) | ||
| if self.agent_context: | ||
| _system_message_suffix = self.agent_context.get_system_message_suffix( | ||
| llm_model=self.llm.model, | ||
| llm_model_canonical=self.llm.model_canonical_name, | ||
| ) | ||
| if _system_message_suffix: | ||
| system_message += "\n\n" + _system_message_suffix | ||
|
|
||
| @property | ||
| def dynamic_context(self) -> str | None: | ||
| """Get the dynamic per-conversation context. | ||
|
|
||
| This returns the context that varies between conversations, such as: | ||
| - Repository information and skills | ||
| - Runtime information (hosts, working directory) | ||
| - User-specific secrets and settings | ||
| - Conversation instructions | ||
|
|
||
| This content should NOT be included in the cached system prompt to enable |
There was a problem hiding this comment.
🟢 Nit: The deprecation timeline (removal in 1.13.0, just 2 versions away) is aggressive for a property that many users likely depend on. Consider:
- Extending the deprecation period to 2.0.0 or later
- Adding a migration guide in the docs showing how to update code
- Providing a clear example of the new pattern in the deprecation warning
The current implementation maintains backward compatibility well, so there's less urgency to force migration.
|
|
||
| def to_llm_messages(self) -> list[Message]: | ||
| """Convert to LLM message format, potentially returning multiple messages. | ||
|
|
||
| When dynamic_context is provided, returns two messages: | ||
| 1. System message with static prompt (cacheable) | ||
| 2. User message with dynamic context (not cached) | ||
|
|
||
| This structure enables cross-conversation prompt caching by keeping the | ||
| static system prompt separate from per-conversation dynamic content. | ||
|
|
There was a problem hiding this comment.
🟡 Suggestion: This docstring says "returns two messages: System message with static prompt (cacheable) [and] User message with dynamic context", but that would create consecutive user messages when the actual user message arrives.
The docstring should clarify:
- This method provides a naive conversion
- Use
LLMConvertibleEvent.events_to_messages()for proper event stream handling that avoids consecutive user messages - Or explain when it's appropriate to use this vs
events_to_messages()
| def to_llm_messages(self) -> list[Message]: | |
| """Convert to LLM message format, potentially returning multiple messages. | |
| When dynamic_context is provided, returns two messages: | |
| 1. System message with static prompt (cacheable) | |
| 2. User message with dynamic context (not cached) | |
| This structure enables cross-conversation prompt caching by keeping the | |
| static system prompt separate from per-conversation dynamic content. | |
| def to_llm_messages(self) -> list[Message]: | |
| """Convert to LLM message format, potentially returning multiple messages. | |
| When dynamic_context is provided, this naive conversion returns two messages: | |
| 1. System message with static prompt (cacheable) | |
| 2. User message with dynamic context (not cached) | |
| WARNING: Using this directly may create consecutive user messages. For proper | |
| event stream conversion that merges dynamic context with the first user message, | |
| use `LLMConvertibleEvent.events_to_messages()` instead. | |
| Returns: | |
| List of Message objects. Contains 1 message if no dynamic_context, | |
| or 2 messages if dynamic_context is provided. | |
| """ |
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
🧪 Condenser Tests ResultsOverall Success Rate: 95.6% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_chat
Skipped Tests:
litellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_vertex_ai_gemini_3_pro_preview
litellm_proxy_moonshot_kimi_k2_thinking
Skipped Tests:
litellm_proxy_mistral_devstral_2512
Skipped Tests:
Failed Tests:
litellm_proxy_gpt_5.1_codex_max
Failed Tests:
|
Summary
This PR enables cross-conversation prompt caching by separating the static system prompt from per-conversation dynamic context (hosts, repos, users, skills, secrets, etc.).
Fixes #1808
Problem
Currently, the system prompt includes dynamic per-conversation content (sandbox URLs, working directories, repo info, etc.). This means every conversation has a unique system prompt, preventing Anthropic's prompt caching from working across conversations.
Evidence from OpenHands Cloud shows two conversations with identical "Hi" messages both showing:
This means each conversation pays the full cost of processing the system prompt instead of benefiting from cache hits.
Root Cause
The dynamic content comes from
AgentContext.get_system_message_suffix()which includes:work-1-jqfdlohypqaduxwq.prod-runtime.all-hands.dev)This content was being appended directly to the system prompt, making the entire prompt unique per conversation.
Solution
Separate the static and dynamic portions:
static_system_message- Returns only the base system prompt template (cacheable across all conversations)dynamic_context- Returns the per-conversation context (hosts, repos, etc.)The
SystemPromptEventnow has an optionaldynamic_contextfield. When present,to_llm_messages()returns two messages:Changes
static_system_messageanddynamic_contextproperties toAgentBasedynamic_contextfield toSystemPromptEventto_llm_messages()method that returns separate system and user messagesevents_to_messages()to useto_llm_messages()forSystemPromptEventExpected Impact
With this change:
This should significantly reduce costs and latency for subsequent conversations.
Testing
Added regression test
test_static_system_message_is_constant_across_different_contextsthat verifies the static system message is identical regardless of theAgentContextprovided.@neubig can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:70697a6-pythonRun
All tags pushed for this build
About Multi-Architecture Support
70697a6-python) is a multi-arch manifest supporting both amd64 and arm6470697a6-python-amd64) are also available if needed