DRAFT: fix: separate static system prompt from dynamic context for cross-conversation caching by neubig · Pull Request #1890 · OpenHands/software-agent-sdk

neubig · 2026-02-03T21:16:01Z

Summary

This PR enables cross-conversation prompt caching by separating the static system prompt from per-conversation dynamic context (hosts, repos, users, skills, secrets, etc.).

Fixes #1808

Problem

Currently, the system prompt includes dynamic per-conversation content (sandbox URLs, working directories, repo info, etc.). This means every conversation has a unique system prompt, preventing Anthropic's prompt caching from working across conversations.

Evidence from OpenHands Cloud shows two conversations with identical "Hi" messages both showing:

Cache Hit: 0
Cache Write: ~27k tokens

This means each conversation pays the full cost of processing the system prompt instead of benefiting from cache hits.

Root Cause

The dynamic content comes from AgentContext.get_system_message_suffix() which includes:

Sandbox URLs with unique runtime IDs (e.g., work-1-jqfdlohypqaduxwq.prod-runtime.all-hands.dev)
Working directory paths
Repository information
Skills and secrets

This content was being appended directly to the system prompt, making the entire prompt unique per conversation.

Solution

Separate the static and dynamic portions:

static_system_message - Returns only the base system prompt template (cacheable across all conversations)
dynamic_context - Returns the per-conversation context (hosts, repos, etc.)

The SystemPromptEvent now has an optional dynamic_context field. When present, to_llm_messages() returns two messages:

System message with static prompt (cacheable)
User message with dynamic context (not cached, but much smaller)

Changes

Add static_system_message and dynamic_context properties to AgentBase
Add dynamic_context field to SystemPromptEvent
Add to_llm_messages() method that returns separate system and user messages
Update events_to_messages() to use to_llm_messages() for SystemPromptEvent
Add regression test to ensure static system message stays constant

Expected Impact

With this change:

Conversation 1: Cache writes the static system prompt (~25k tokens)
Conversation 2+: Cache HITS on the static system prompt, only processes dynamic context (~2k tokens)

This should significantly reduce costs and latency for subsequent conversations.

Testing

Added regression test test_static_system_message_is_constant_across_different_contexts that verifies the static system message is identical regardless of the AgentContext provided.

@neubig can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:70697a6-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-70697a6-python \
  ghcr.io/openhands/agent-server:70697a6-python

All tags pushed for this build

ghcr.io/openhands/agent-server:70697a6-golang-amd64
ghcr.io/openhands/agent-server:70697a6-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:70697a6-golang-arm64
ghcr.io/openhands/agent-server:70697a6-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:70697a6-java-amd64
ghcr.io/openhands/agent-server:70697a6-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:70697a6-java-arm64
ghcr.io/openhands/agent-server:70697a6-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:70697a6-python-amd64
ghcr.io/openhands/agent-server:70697a6-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:70697a6-python-arm64
ghcr.io/openhands/agent-server:70697a6-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:70697a6-golang
ghcr.io/openhands/agent-server:70697a6-java
ghcr.io/openhands/agent-server:70697a6-python

About Multi-Architecture Support

Each variant tag (e.g., 70697a6-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 70697a6-python-amd64) are also available if needed

…versation caching This change enables cross-conversation prompt caching by separating the static system prompt from per-conversation dynamic context (hosts, repos, users, etc.). Changes: - Add static_system_message and dynamic_context properties to AgentBase - Add dynamic_context field to SystemPromptEvent - Add to_llm_messages() method that returns separate system and user messages - Update events_to_messages() to use to_llm_messages() for SystemPromptEvent - Add regression test to ensure static system message stays constant With this change, the static system prompt can be cached and reused across conversations, while dynamic context is sent as a separate user message. Fixes #1808 Co-authored-by: openhands <[email protected]>

…sage Mark system_message as deprecated in 1.11.0 with removal in 1.13.0. Update all usages in examples and tests to use static_system_message instead. Co-authored-by: openhands <[email protected]>

Co-authored-by: openhands <[email protected]>

github-actions · 2026-02-03T21:42:01Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/agent
agent.py	247	49	80%	94, 98, 209–210, 214–215, 217, 223–224, 226, 228–229, 234, 237, 241–244, 280–282, 310–311, 318–319, 351, 404–405, 407, 447, 586–587, 592, 604–605, 610–611, 630–631, 633, 661–662, 668–669, 673, 681–682, 722, 729
base.py	188	18	90%	200, 282–286, 334–336, 346, 356, 364–365, 475, 512–513, 523–524
openhands-sdk/openhands/sdk/event
base.py	91	9	90%	52, 63, 75–76, 82, 85–86, 88, 163
openhands-sdk/openhands/sdk/event/llm_convertible
system.py	53	18	66%	50–54, 71–72, 97–100, 104–105, 110–113, 116
TOTAL	16937	4983	70%

…ve users When SystemPromptEvent has dynamic_context, instead of emitting it as a separate user message (which creates consecutive user messages when the actual user sends 'Hi'), we now merge the dynamic context into the first subsequent user message. This fixes the issue where entering a message in the UI would result in nothing happening because the LLM received [system, user, user] which some providers handle poorly. The message sequence is now: - [system] static prompt (cacheable) - [user] dynamic_context + user's actual message This preserves prompt caching while maintaining proper message alternation. Co-authored-by: openhands <[email protected]>

enyst

Great catch! Yes... compared to v0, we now include the suffix into the usual system prompt. 🤔 That works fine with prompt caching within the same conversation, but not cross-conversation.

Thank you for taking this up!

neubig · 2026-02-03T23:15:26Z

@OpenHands fix the failing CI, and then run the first example from the example directory and see if it completes as expected.

openhands-ai · 2026-02-03T23:15:35Z

I'm on it! neubig can track my progress at all-hands.dev

Co-authored-by: openhands <[email protected]>

openhands-ai · 2026-02-03T23:20:09Z

Summary

I completed both tasks requested:

1. Fixed the Failing CI (Pre-commit checks)

The pre-commit checks were failing due to line length violations (E501) in tests/sdk/event/test_dynamic_context_message_sequence.py. I fixed:

Line 21: Shortened the docstring from 89 to 88 characters
Line 64: Split a long error message string across two lines
Applied ruff formatting fixes (multi-line formatting for long expressions)

The fix has been committed and pushed to the fix/prompt-caching-cross-conversation branch.

2. Ran the First Example

I ran examples/01_standalone_sdk/01_hello_world.py which:

✅ Successfully initialized the conversation with the LLM
✅ The agent explored the repository (README.md, pyproject.toml)
✅ Created FACTS.txt with 3 accurate facts about the project:
1. It's a Python framework for building code-working agents with local and ephemeral workspace support
2. It's a monorepo with 4 packages (openhands-sdk, openhands-tools, openhands-workspace, openhands-agent-server)
3. It achieved 77.6 on SWEBench and powers OpenHands CLI and Cloud
✅ Completed successfully with "All done!" message

I cleaned up the generated FACTS.txt file after verification since it was just for testing purposes.

PR Status: The changes have been pushed to the branch and should now pass the pre-commit CI checks.

View full conversation

github-actions · 2026-02-04T21:44:37Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

all-hands-bot

Review Summary

The core approach of separating static system prompt from dynamic context is solid and will enable cross-conversation caching. However, there are some inconsistencies between the PR description and implementation, plus API design issues that should be addressed.

Key Concerns:

The to_llm_messages() method is defined but never called, creating confusion
PR description claims events_to_messages() uses to_llm_messages(), but it actually uses to_llm_message() (singular)
Some edge cases could use better test coverage

See inline comments for specific issues and suggestions.

all-hands-bot · 2026-02-04T21:47:46Z

openhands-sdk/openhands/sdk/event/llm_convertible/system.py

+        """
+        messages = [Message(role="system", content=[self.system_prompt])]
+        if self.dynamic_context:
+            messages.append(Message(role="user", content=[self.dynamic_context]))


🟠 Important: This method is defined but never called in the codebase. The PR description says "Update events_to_messages() to use to_llm_messages()", but looking at event/base.py:133, it actually calls to_llm_message() (singular) instead.

This creates API confusion:

Is to_llm_messages() intended for direct API usage?

Should events_to_messages() be updated to use this method?

If this method returns [system, user] messages when dynamic_context is present, wouldn't that create consecutive user messages (the exact problem you're solving)?

Recommendation: Either:

Remove this method if it's not needed

Update events_to_messages() to actually use it (if possible without breaking the merge logic)

Add a clear docstring explaining why this exists but isn't used by events_to_messages()

all-hands-bot · 2026-02-04T21:47:46Z

openhands-sdk/openhands/sdk/event/base.py

                # Create combined message for the response
                messages.append(_combine_action_events(batch_events))
                i = j
+            elif isinstance(event, SystemPromptEvent):


🟡 Suggestion: This calls event.to_llm_message() (singular), but SystemPromptEvent now has a to_llm_messages() (plural) method. The PR description says you "Update events_to_messages() to use to_llm_messages()", but that's not what's happening here.

The current implementation is actually correct (it manually handles the dynamic context merging), but the mismatch with the PR description is confusing. Consider clarifying the intent.

all-hands-bot · 2026-02-04T21:47:46Z

openhands-sdk/openhands/sdk/event/base.py

+        pending_dynamic_context: TextContent | None = None
        i = 0

        while i < len(events):
            event = events[i]


🟢 Nit: The pending_dynamic_context variable and its deferral logic could benefit from a comment block explaining the strategy:

Suggested change

pending_dynamic_context: TextContent | None = None

i = 0

while i < len(events):

event = events[i]

messages: list[Message] = []

# Track dynamic context from SystemPromptEvent to merge with the next user message.

# This prevents consecutive user messages while enabling cross-conversation caching.

pending_dynamic_context: TextContent | None = None

This helps future maintainers understand the non-obvious logic.

all-hands-bot · 2026-02-04T21:47:46Z

openhands-sdk/openhands/sdk/event/base.py

                i += 1

+        # If there's pending dynamic context but no user message followed,
+        # add it as a separate user message (edge case)


🟡 Suggestion: This edge case handling is good, but could lead to unusual message sequences if the first event after SystemPromptEvent is an assistant message (you'd get [system, assistant, user]).

While this is probably rare in practice, consider either:

Adding a test case for this scenario

Adding a comment explaining the expected conversation flow (SystemPromptEvent should always be followed by a user message in normal usage)

all-hands-bot · 2026-02-04T21:47:46Z

tests/sdk/event/test_dynamic_context_message_sequence.py

+
+    # Dynamic context should be present somewhere in the messages
+    assert "Working directory" in full_text, "Dynamic context should be preserved"
+    assert "Date: 2024-01-15" in full_text, "Dynamic context should be preserved"


🟡 Suggestion: The test coverage is good for the happy path, but consider adding tests for edge cases:

What happens if SystemPromptEvent appears in the middle of a conversation (after other events)?

What if there are multiple SystemPromptEvents in the event stream?

What if SystemPromptEvent has dynamic context but is followed by an assistant message (not a user message)?

These edge cases would help verify the robustness of the pending_dynamic_context logic.

all-hands-bot · 2026-02-04T21:47:46Z

openhands-sdk/openhands/sdk/agent/base.py

+        return render_template(
            prompt_dir=self.prompt_dir,
            template_name=self.system_prompt_filename,
            **template_kwargs,
        )
-        if self.agent_context:
-            _system_message_suffix = self.agent_context.get_system_message_suffix(
-                llm_model=self.llm.model,
-                llm_model_canonical=self.llm.model_canonical_name,
-            )
-            if _system_message_suffix:
-                system_message += "\n\n" + _system_message_suffix
+
+    @property
+    def dynamic_context(self) -> str | None:
+        """Get the dynamic per-conversation context.
+
+        This returns the context that varies between conversations, such as:
+        - Repository information and skills
+        - Runtime information (hosts, working directory)
+        - User-specific secrets and settings
+        - Conversation instructions
+
+        This content should NOT be included in the cached system prompt to enable


🟢 Nit: The deprecation timeline (removal in 1.13.0, just 2 versions away) is aggressive for a property that many users likely depend on. Consider:

Extending the deprecation period to 2.0.0 or later

Adding a migration guide in the docs showing how to update code

Providing a clear example of the new pattern in the deprecation warning

The current implementation maintains backward compatibility well, so there's less urgency to force migration.

all-hands-bot · 2026-02-04T21:47:46Z

openhands-sdk/openhands/sdk/event/llm_convertible/system.py


+    def to_llm_messages(self) -> list[Message]:
+        """Convert to LLM message format, potentially returning multiple messages.
+
+        When dynamic_context is provided, returns two messages:
+        1. System message with static prompt (cacheable)
+        2. User message with dynamic context (not cached)
+
+        This structure enables cross-conversation prompt caching by keeping the
+        static system prompt separate from per-conversation dynamic content.
+


🟡 Suggestion: This docstring says "returns two messages: System message with static prompt (cacheable) [and] User message with dynamic context", but that would create consecutive user messages when the actual user message arrives.

The docstring should clarify:

This method provides a naive conversion

Use LLMConvertibleEvent.events_to_messages() for proper event stream handling that avoids consecutive user messages

Or explain when it's appropriate to use this vs events_to_messages()

Suggested change

def to_llm_messages(self) -> list[Message]:

"""Convert to LLM message format, potentially returning multiple messages.

When dynamic_context is provided, returns two messages:

1. System message with static prompt (cacheable)

2. User message with dynamic context (not cached)

This structure enables cross-conversation prompt caching by keeping the

static system prompt separate from per-conversation dynamic content.

def to_llm_messages(self) -> list[Message]:

"""Convert to LLM message format, potentially returning multiple messages.

When dynamic_context is provided, this naive conversion returns two messages:

1. System message with static prompt (cacheable)

2. User message with dynamic context (not cached)

WARNING: Using this directly may create consecutive user messages. For proper

event stream conversion that merges dynamic context with the first user message,

use `LLMConvertibleEvent.events_to_messages()` instead.

Returns:

List of Message objects. Contains 1 message if no dynamic_context,

or 2 messages if dynamic_context is provided.

"""

openhands-ai · 2026-02-04T21:52:43Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1890 at branch `fix/prompt-caching-cross-conversation`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

github-actions · 2026-02-04T21:56:19Z

🧪 Condenser Tests Results

Overall Success Rate: 95.6%
Total Cost: $1.54
Models Tested: 6
Timestamp: 2026-02-04 21:56:13 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_deepseek_deepseek_chat: 📥 View & Download Logs
litellm_proxy_claude_sonnet_4_5_20250929: 📥 View & Download Logs
litellm_proxy_vertex_ai_gemini_3_pro_preview: 📥 View & Download Logs
litellm_proxy_moonshot_kimi_k2_thinking: 📥 View & Download Logs
litellm_proxy_mistral_devstral_2512: 📥 View & Download Logs
litellm_proxy_gpt_5.1_codex_max: 📥 View & Download Logs

📊 Summary

Model	Overall	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_deepseek_deepseek_chat	100.0%	7/7	1	8	$0.07	1,464,146
litellm_proxy_claude_sonnet_4_5_20250929	100.0%	8/8	0	8	$0.43	272,716
litellm_proxy_vertex_ai_gemini_3_pro_preview	100.0%	8/8	0	8	$0.30	220,268
litellm_proxy_moonshot_kimi_k2_thinking	100.0%	7/7	1	8	$0.17	268,616
litellm_proxy_mistral_devstral_2512	85.7%	6/7	1	8	$0.10	238,688
litellm_proxy_gpt_5.1_codex_max	87.5%	7/8	0	8	$0.47	560,805

📋 Detailed Results

litellm_proxy_deepseek_deepseek_chat

Success Rate: 100.0% (7/7)
Total Cost: $0.07
Token Usage: prompt: 1,452,776, completion: 11,370, cache_read: 1,370,048
Run Suffix: litellm_proxy_deepseek_deepseek_chat_0905f5e_deepseek_run_N8_20260204_214511
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_claude_sonnet_4_5_20250929

Success Rate: 100.0% (8/8)
Total Cost: $0.43
Token Usage: prompt: 265,768, completion: 6,948, cache_read: 193,968, cache_write: 71,369, reasoning: 1,804
Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_0905f5e_sonnet_run_N8_20260204_214500

litellm_proxy_vertex_ai_gemini_3_pro_preview

Success Rate: 100.0% (8/8)
Total Cost: $0.30
Token Usage: prompt: 214,303, completion: 5,965, cache_read: 113,703, reasoning: 3,941
Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_0905f5e_gemini_3_pro_run_N8_20260204_214505

litellm_proxy_moonshot_kimi_k2_thinking

Success Rate: 100.0% (7/7)
Total Cost: $0.17
Token Usage: prompt: 262,873, completion: 5,743, cache_read: 203,776
Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_0905f5e_kimi_k2_run_N8_20260204_214503
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_mistral_devstral_2512

Success Rate: 85.7% (6/7)
Total Cost: $0.10
Token Usage: prompt: 235,570, completion: 3,118
Run Suffix: litellm_proxy_mistral_devstral_2512_0905f5e_devstral_2512_run_N8_20260204_214504
Skipped Tests: 1

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

t02_add_bash_hello: Shell script is not executable (Cost: $0.009)

litellm_proxy_gpt_5.1_codex_max

Success Rate: 87.5% (7/8)
Total Cost: $0.47
Token Usage: prompt: 549,846, completion: 10,959, cache_read: 287,616, reasoning: 5,120
Run Suffix: litellm_proxy_gpt_5.1_codex_max_0905f5e_gpt51_codex_run_N8_20260204_214506

Failed Tests:

t02_add_bash_hello: Shell script 'shell/hello.sh' not found (Cost: $0.24)

openhands-agent added 4 commits February 3, 2026 21:14

deprecate system_message property, update usages to static_system_mes…

feff3d1

…sage Mark system_message as deprecated in 1.11.0 with removal in 1.13.0. Update all usages in examples and tests to use static_system_message instead. Co-authored-by: openhands <[email protected]>

fix: lint errors (line length)

1af9c35

Co-authored-by: openhands <[email protected]>

style: apply ruff formatting

cc2eea6

Co-authored-by: openhands <[email protected]>

enyst self-requested a review February 3, 2026 22:27

enyst reviewed Feb 3, 2026

View reviewed changes

style: fix line length issues in test file

0905f5e

Co-authored-by: openhands <[email protected]>

xingyaoww mentioned this pull request Feb 4, 2026

Investigate prompt caching differences between Claude Code and OpenHands #1808

Open

2 tasks

simonrosenberg added the integration-test Runs the integration tests and comments the results label Feb 4, 2026

Merge branch 'main' into fix/prompt-caching-cross-conversation

686b42b

simonrosenberg marked this pull request as ready for review February 4, 2026 21:44

all-hands-bot reviewed Feb 4, 2026

View reviewed changes

Conversation

neubig commented Feb 3, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Root Cause

Solution

Changes

Expected Impact

Testing

Uh oh!

github-actions bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

neubig commented Feb 3, 2026

Uh oh!

openhands-ai bot commented Feb 3, 2026

Uh oh!

openhands-ai bot commented Feb 3, 2026

Summary

1. Fixed the Failing CI (Pre-commit checks)

2. Ran the First Example

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

all-hands-bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot commented Feb 4, 2026

Uh oh!

github-actions bot commented Feb 4, 2026

🧪 Condenser Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_deepseek_deepseek_chat

litellm_proxy_claude_sonnet_4_5_20250929

litellm_proxy_vertex_ai_gemini_3_pro_preview

litellm_proxy_moonshot_kimi_k2_thinking

litellm_proxy_mistral_devstral_2512

litellm_proxy_gpt_5.1_codex_max

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

neubig commented Feb 3, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Feb 3, 2026 •

edited

Loading