Skip to content

DRAFT: fix: separate static system prompt from dynamic context for cross-conversation caching#1890

Open
neubig wants to merge 7 commits intomainfrom
fix/prompt-caching-cross-conversation
Open

DRAFT: fix: separate static system prompt from dynamic context for cross-conversation caching#1890
neubig wants to merge 7 commits intomainfrom
fix/prompt-caching-cross-conversation

Conversation

@neubig
Copy link
Contributor

@neubig neubig commented Feb 3, 2026

Summary

This PR enables cross-conversation prompt caching by separating the static system prompt from per-conversation dynamic context (hosts, repos, users, skills, secrets, etc.).

Fixes #1808

Problem

Currently, the system prompt includes dynamic per-conversation content (sandbox URLs, working directories, repo info, etc.). This means every conversation has a unique system prompt, preventing Anthropic's prompt caching from working across conversations.

Evidence from OpenHands Cloud shows two conversations with identical "Hi" messages both showing:

  • Cache Hit: 0
  • Cache Write: ~27k tokens

This means each conversation pays the full cost of processing the system prompt instead of benefiting from cache hits.

Root Cause

The dynamic content comes from AgentContext.get_system_message_suffix() which includes:

  • Sandbox URLs with unique runtime IDs (e.g., work-1-jqfdlohypqaduxwq.prod-runtime.all-hands.dev)
  • Working directory paths
  • Repository information
  • Skills and secrets

This content was being appended directly to the system prompt, making the entire prompt unique per conversation.

Solution

Separate the static and dynamic portions:

  1. static_system_message - Returns only the base system prompt template (cacheable across all conversations)
  2. dynamic_context - Returns the per-conversation context (hosts, repos, etc.)

The SystemPromptEvent now has an optional dynamic_context field. When present, to_llm_messages() returns two messages:

  1. System message with static prompt (cacheable)
  2. User message with dynamic context (not cached, but much smaller)

Changes

  • Add static_system_message and dynamic_context properties to AgentBase
  • Add dynamic_context field to SystemPromptEvent
  • Add to_llm_messages() method that returns separate system and user messages
  • Update events_to_messages() to use to_llm_messages() for SystemPromptEvent
  • Add regression test to ensure static system message stays constant

Expected Impact

With this change:

  • Conversation 1: Cache writes the static system prompt (~25k tokens)
  • Conversation 2+: Cache HITS on the static system prompt, only processes dynamic context (~2k tokens)

This should significantly reduce costs and latency for subsequent conversations.

Testing

Added regression test test_static_system_message_is_constant_across_different_contexts that verifies the static system message is identical regardless of the AgentContext provided.

@neubig can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:70697a6-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-70697a6-python \
  ghcr.io/openhands/agent-server:70697a6-python

All tags pushed for this build

ghcr.io/openhands/agent-server:70697a6-golang-amd64
ghcr.io/openhands/agent-server:70697a6-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:70697a6-golang-arm64
ghcr.io/openhands/agent-server:70697a6-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:70697a6-java-amd64
ghcr.io/openhands/agent-server:70697a6-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:70697a6-java-arm64
ghcr.io/openhands/agent-server:70697a6-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:70697a6-python-amd64
ghcr.io/openhands/agent-server:70697a6-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:70697a6-python-arm64
ghcr.io/openhands/agent-server:70697a6-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:70697a6-golang
ghcr.io/openhands/agent-server:70697a6-java
ghcr.io/openhands/agent-server:70697a6-python

About Multi-Architecture Support

  • Each variant tag (e.g., 70697a6-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 70697a6-python-amd64) are also available if needed

…versation caching

This change enables cross-conversation prompt caching by separating the static
system prompt from per-conversation dynamic context (hosts, repos, users, etc.).

Changes:
- Add static_system_message and dynamic_context properties to AgentBase
- Add dynamic_context field to SystemPromptEvent
- Add to_llm_messages() method that returns separate system and user messages
- Update events_to_messages() to use to_llm_messages() for SystemPromptEvent
- Add regression test to ensure static system message stays constant

With this change, the static system prompt can be cached and reused across
conversations, while dynamic context is sent as a separate user message.

Fixes #1808

Co-authored-by: openhands <[email protected]>
…sage

Mark system_message as deprecated in 1.11.0 with removal in 1.13.0.
Update all usages in examples and tests to use static_system_message instead.

Co-authored-by: openhands <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   agent.py2474980%94, 98, 209–210, 214–215, 217, 223–224, 226, 228–229, 234, 237, 241–244, 280–282, 310–311, 318–319, 351, 404–405, 407, 447, 586–587, 592, 604–605, 610–611, 630–631, 633, 661–662, 668–669, 673, 681–682, 722, 729
   base.py1881890%200, 282–286, 334–336, 346, 356, 364–365, 475, 512–513, 523–524
openhands-sdk/openhands/sdk/event
   base.py91990%52, 63, 75–76, 82, 85–86, 88, 163
openhands-sdk/openhands/sdk/event/llm_convertible
   system.py531866%50–54, 71–72, 97–100, 104–105, 110–113, 116
TOTAL16937498370% 

…ve users

When SystemPromptEvent has dynamic_context, instead of emitting it as a
separate user message (which creates consecutive user messages when the
actual user sends 'Hi'), we now merge the dynamic context into the first
subsequent user message.

This fixes the issue where entering a message in the UI would result in
nothing happening because the LLM received [system, user, user] which
some providers handle poorly.

The message sequence is now:
- [system] static prompt (cacheable)
- [user] dynamic_context + user's actual message

This preserves prompt caching while maintaining proper message alternation.

Co-authored-by: openhands <[email protected]>
@enyst enyst self-requested a review February 3, 2026 22:27
Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Yes... compared to v0, we now include the suffix into the usual system prompt. 🤔 That works fine with prompt caching within the same conversation, but not cross-conversation.

Thank you for taking this up!

@neubig
Copy link
Contributor Author

neubig commented Feb 3, 2026

@OpenHands fix the failing CI, and then run the first example from the example directory and see if it completes as expected.

@openhands-ai
Copy link

openhands-ai bot commented Feb 3, 2026

I'm on it! neubig can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Feb 3, 2026

Summary

I completed both tasks requested:

1. Fixed the Failing CI (Pre-commit checks)

The pre-commit checks were failing due to line length violations (E501) in tests/sdk/event/test_dynamic_context_message_sequence.py. I fixed:

  • Line 21: Shortened the docstring from 89 to 88 characters
  • Line 64: Split a long error message string across two lines
  • Applied ruff formatting fixes (multi-line formatting for long expressions)

The fix has been committed and pushed to the fix/prompt-caching-cross-conversation branch.

2. Ran the First Example

I ran examples/01_standalone_sdk/01_hello_world.py which:

  • ✅ Successfully initialized the conversation with the LLM
  • ✅ The agent explored the repository (README.md, pyproject.toml)
  • ✅ Created FACTS.txt with 3 accurate facts about the project:
    1. It's a Python framework for building code-working agents with local and ephemeral workspace support
    2. It's a monorepo with 4 packages (openhands-sdk, openhands-tools, openhands-workspace, openhands-agent-server)
    3. It achieved 77.6 on SWEBench and powers OpenHands CLI and Cloud
  • ✅ Completed successfully with "All done!" message

I cleaned up the generated FACTS.txt file after verification since it was just for testing purposes.

PR Status: The changes have been pushed to the branch and should now pass the pre-commit CI checks.

View full conversation

@simonrosenberg simonrosenberg added the integration-test Runs the integration tests and comments the results label Feb 4, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@simonrosenberg simonrosenberg marked this pull request as ready for review February 4, 2026 21:44
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

The core approach of separating static system prompt from dynamic context is solid and will enable cross-conversation caching. However, there are some inconsistencies between the PR description and implementation, plus API design issues that should be addressed.

Key Concerns:

  • The to_llm_messages() method is defined but never called, creating confusion
  • PR description claims events_to_messages() uses to_llm_messages(), but it actually uses to_llm_message() (singular)
  • Some edge cases could use better test coverage

See inline comments for specific issues and suggestions.

"""
messages = [Message(role="system", content=[self.system_prompt])]
if self.dynamic_context:
messages.append(Message(role="user", content=[self.dynamic_context]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: This method is defined but never called in the codebase. The PR description says "Update events_to_messages() to use to_llm_messages()", but looking at event/base.py:133, it actually calls to_llm_message() (singular) instead.

This creates API confusion:

  • Is to_llm_messages() intended for direct API usage?
  • Should events_to_messages() be updated to use this method?
  • If this method returns [system, user] messages when dynamic_context is present, wouldn't that create consecutive user messages (the exact problem you're solving)?

Recommendation: Either:

  1. Remove this method if it's not needed
  2. Update events_to_messages() to actually use it (if possible without breaking the merge logic)
  3. Add a clear docstring explaining why this exists but isn't used by events_to_messages()

# Create combined message for the response
messages.append(_combine_action_events(batch_events))
i = j
elif isinstance(event, SystemPromptEvent):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: This calls event.to_llm_message() (singular), but SystemPromptEvent now has a to_llm_messages() (plural) method. The PR description says you "Update events_to_messages() to use to_llm_messages()", but that's not what's happening here.

The current implementation is actually correct (it manually handles the dynamic context merging), but the mismatch with the PR description is confusing. Consider clarifying the intent.

Comment on lines +108 to 112
pending_dynamic_context: TextContent | None = None
i = 0

while i < len(events):
event = events[i]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: The pending_dynamic_context variable and its deferral logic could benefit from a comment block explaining the strategy:

Suggested change
pending_dynamic_context: TextContent | None = None
i = 0
while i < len(events):
event = events[i]
messages: list[Message] = []
# Track dynamic context from SystemPromptEvent to merge with the next user message.
# This prevents consecutive user messages while enabling cross-conversation caching.
pending_dynamic_context: TextContent | None = None

This helps future maintainers understand the non-obvious logic.

Comment on lines 158 to +161
i += 1

# If there's pending dynamic context but no user message followed,
# add it as a separate user message (edge case)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: This edge case handling is good, but could lead to unusual message sequences if the first event after SystemPromptEvent is an assistant message (you'd get [system, assistant, user]).

While this is probably rare in practice, consider either:

  1. Adding a test case for this scenario
  2. Adding a comment explaining the expected conversation flow (SystemPromptEvent should always be followed by a user message in normal usage)


# Dynamic context should be present somewhere in the messages
assert "Working directory" in full_text, "Dynamic context should be preserved"
assert "Date: 2024-01-15" in full_text, "Dynamic context should be preserved"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The test coverage is good for the happy path, but consider adding tests for edge cases:

  • What happens if SystemPromptEvent appears in the middle of a conversation (after other events)?
  • What if there are multiple SystemPromptEvents in the event stream?
  • What if SystemPromptEvent has dynamic context but is followed by an assistant message (not a user message)?

These edge cases would help verify the robustness of the pending_dynamic_context logic.

Comment on lines +234 to +250
return render_template(
prompt_dir=self.prompt_dir,
template_name=self.system_prompt_filename,
**template_kwargs,
)
if self.agent_context:
_system_message_suffix = self.agent_context.get_system_message_suffix(
llm_model=self.llm.model,
llm_model_canonical=self.llm.model_canonical_name,
)
if _system_message_suffix:
system_message += "\n\n" + _system_message_suffix

@property
def dynamic_context(self) -> str | None:
"""Get the dynamic per-conversation context.

This returns the context that varies between conversations, such as:
- Repository information and skills
- Runtime information (hosts, working directory)
- User-specific secrets and settings
- Conversation instructions

This content should NOT be included in the cached system prompt to enable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: The deprecation timeline (removal in 1.13.0, just 2 versions away) is aggressive for a property that many users likely depend on. Consider:

  1. Extending the deprecation period to 2.0.0 or later
  2. Adding a migration guide in the docs showing how to update code
  3. Providing a clear example of the new pattern in the deprecation warning

The current implementation maintains backward compatibility well, so there's less urgency to force migration.

Comment on lines 82 to +92

def to_llm_messages(self) -> list[Message]:
"""Convert to LLM message format, potentially returning multiple messages.

When dynamic_context is provided, returns two messages:
1. System message with static prompt (cacheable)
2. User message with dynamic context (not cached)

This structure enables cross-conversation prompt caching by keeping the
static system prompt separate from per-conversation dynamic content.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: This docstring says "returns two messages: System message with static prompt (cacheable) [and] User message with dynamic context", but that would create consecutive user messages when the actual user message arrives.

The docstring should clarify:

  • This method provides a naive conversion
  • Use LLMConvertibleEvent.events_to_messages() for proper event stream handling that avoids consecutive user messages
  • Or explain when it's appropriate to use this vs events_to_messages()
Suggested change
def to_llm_messages(self) -> list[Message]:
"""Convert to LLM message format, potentially returning multiple messages.
When dynamic_context is provided, returns two messages:
1. System message with static prompt (cacheable)
2. User message with dynamic context (not cached)
This structure enables cross-conversation prompt caching by keeping the
static system prompt separate from per-conversation dynamic content.
def to_llm_messages(self) -> list[Message]:
"""Convert to LLM message format, potentially returning multiple messages.
When dynamic_context is provided, this naive conversion returns two messages:
1. System message with static prompt (cacheable)
2. User message with dynamic context (not cached)
WARNING: Using this directly may create consecutive user messages. For proper
event stream conversion that merges dynamic context with the first user message,
use `LLMConvertibleEvent.events_to_messages()` instead.
Returns:
List of Message objects. Contains 1 message if no dynamic_context,
or 2 messages if dynamic_context is provided.
"""

@openhands-ai
Copy link

openhands-ai bot commented Feb 4, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1890 at branch `fix/prompt-caching-cross-conversation`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

🧪 Condenser Tests Results

Overall Success Rate: 95.6%
Total Cost: $1.54
Models Tested: 6
Timestamp: 2026-02-04 21:56:13 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_deepseek_deepseek_chat 100.0% 7/7 1 8 $0.07 1,464,146
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 8/8 0 8 $0.43 272,716
litellm_proxy_vertex_ai_gemini_3_pro_preview 100.0% 8/8 0 8 $0.30 220,268
litellm_proxy_moonshot_kimi_k2_thinking 100.0% 7/7 1 8 $0.17 268,616
litellm_proxy_mistral_devstral_2512 85.7% 6/7 1 8 $0.10 238,688
litellm_proxy_gpt_5.1_codex_max 87.5% 7/8 0 8 $0.47 560,805

📋 Detailed Results

litellm_proxy_deepseek_deepseek_chat

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.07
  • Token Usage: prompt: 1,452,776, completion: 11,370, cache_read: 1,370,048
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_0905f5e_deepseek_run_N8_20260204_214511
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_claude_sonnet_4_5_20250929

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.43
  • Token Usage: prompt: 265,768, completion: 6,948, cache_read: 193,968, cache_write: 71,369, reasoning: 1,804
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_0905f5e_sonnet_run_N8_20260204_214500

litellm_proxy_vertex_ai_gemini_3_pro_preview

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.30
  • Token Usage: prompt: 214,303, completion: 5,965, cache_read: 113,703, reasoning: 3,941
  • Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_0905f5e_gemini_3_pro_run_N8_20260204_214505

litellm_proxy_moonshot_kimi_k2_thinking

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.17
  • Token Usage: prompt: 262,873, completion: 5,743, cache_read: 203,776
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_0905f5e_kimi_k2_run_N8_20260204_214503
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_mistral_devstral_2512

  • Success Rate: 85.7% (6/7)
  • Total Cost: $0.10
  • Token Usage: prompt: 235,570, completion: 3,118
  • Run Suffix: litellm_proxy_mistral_devstral_2512_0905f5e_devstral_2512_run_N8_20260204_214504
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t02_add_bash_hello: Shell script is not executable (Cost: $0.009)

litellm_proxy_gpt_5.1_codex_max

  • Success Rate: 87.5% (7/8)
  • Total Cost: $0.47
  • Token Usage: prompt: 549,846, completion: 10,959, cache_read: 287,616, reasoning: 5,120
  • Run Suffix: litellm_proxy_gpt_5.1_codex_max_0905f5e_gpt51_codex_run_N8_20260204_214506

Failed Tests:

  • t02_add_bash_hello: Shell script 'shell/hello.sh' not found (Cost: $0.24)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate prompt caching differences between Claude Code and OpenHands

5 participants