Skip to content

feat: Persistent LLM profiles#1882

Open
VascoSch92 wants to merge 9 commits intoOpenHands:mainfrom
VascoSch92:llm-profiles
Open

feat: Persistent LLM profiles#1882
VascoSch92 wants to merge 9 commits intoOpenHands:mainfrom
VascoSch92:llm-profiles

Conversation

@VascoSch92
Copy link

Summary

The PR propose a solution for issue #781

The class ProfileManager is introduced, whose sole job is to save, store, and update profiles on disk. Every time we work with persistent profiles, it should be done through this API to ensure a consistent and reliable way to store, load, and update profiles.

The class can then be used to implement a profile selection solution for the Openhands-CLI and in LLMRegistry to save or load registries coming from different experiments or runs.

Changed Script

manager.py

The script contains the ProfileManager class, which has the following characteristics:

  • The class manages loading, saving, deleting, and updating persistent profiles.
  • The class uses a configuration file as a sort of index to load files lazily, preventing the need to load the entire profile store at once, thereby saving time and memory.
  • The solution works fine in settings where there are no multiple ProfileManager instances working in the same directory (e.g., Openhands-CLI). Otherwise, concurrency issues could arise. Since I assume these are mostly corner cases, I haven't implemented any countermeasures yet. Does that make sense?

config.py

The script contains Pydantic models used to validate the configuration file associated with a profile store. These models are backward compatible.

llm_registry.py

Several modifications were made to the LLMRegistry class; however, the API of the class did not change, ensuring backward compatibility. Shortly:

  • Two new parameters were added: load_from_disk and profile_manager_path.

    • load_from_disk is a boolean flag that determines whether to use profiles persisted on disk.
    • If load_from_disk is True, profile_manager_path specifies the location of the profiles.

    Two variables were chosen instead of one because of the default path for the profile manager.

  • The property usage_to_llm is deprecated. Currently, it returns the internal state _usage_to_llm rather than a copy, which is not ideal. If access is needed, a copy or proxy map should be returned. The property was used only in a few tests, and since we do not want to change the API, it was simply deprecated.

  • The get method now lazily loads profiles from the store when necessary.

  • A new method, export_profiles, was added to save the registry. This is very helpful when experimenting or sharing LLM profiles between coworkers.

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

@VascoSch92 VascoSch92 requested review from enyst and xingyaoww February 2, 2026 22:13
@VascoSch92 VascoSch92 added the enhancement New feature or request label Feb 2, 2026
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR introduces a solid foundation for persistent LLM profiles. The implementation is generally well-structured with good test coverage. However, there are several issues that should be addressed:

Critical Issues:

  • Profile files committed to repository root should be relocated
  • Missing error handling in lazy loading logic could cause incorrect exceptions

Important Issues:

  • Deprecated property still exposes mutable internal state
  • Incomplete cache invalidation
  • Backward compatibility concerns with list_usage_ids()

See inline comments for details.

removed_in="1.7.0",
details="The method is deprecated and will be removed in next versions",
)
def usage_to_llm(self) -> dict[str, LLM]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Deprecated property returns mutable internal state.

The deprecation note says "a copy or proxy map should be returned" but the implementation still returns the internal dict directly. This allows callers to mutate _usage_to_llm.

Also, the deprecation message says "The method is deprecated" but this is a property, not a method.

Suggested change
def usage_to_llm(self) -> dict[str, LLM]:
@deprecated(
deprecated_in="1.6.0",
removed_in="1.7.0",
details="This property is deprecated and will be removed in future versions. Use list_usage_ids() and get() instead.",
)
def usage_to_llm(self) -> dict[str, LLM]:
"""Access a copy of the internal usage-ID-to-LLM mapping."""
return self._usage_to_llm.copy()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, usage_id is an id used for the LLM when computing conversation stats, e.g. we wanted to be able to say "the agent LLM used 120k tokens total", "the condenser LLM used 45k tokens total".

I think they're not profile ids 🤔 WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that allowing an external caller to change an internal state breaks the API contract.

Could you point me to an example where this property is used? I searched the repo and it seems it's never referenced.

If you still want to keep this property, I suggest returning:

  • a copy of the dictionary (ok but not ideal)
  • a MappingProxyType (better solution)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I don't think we care about the property, if it's not used let's kill it!

Comment on lines 127 to 134
llm_instance = self._usage_to_llm[usage_id]
elif self.profile_manager and usage_id in self.profile_manager.get_id_usages():
profile_name = next(
p.name
for p in self.profile_manager.config.profiles
if p.usage_id == usage_id
)
llm_instance = self.profile_manager.load_profile(profile_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: Missing error handling for lazy loading.

If no profile matches the usage_id, this next() call will raise StopIteration instead of the intended KeyError. The in check above should prevent this, but it's better to be defensive.

Consider using:

profile_name = next(
    (p.name for p in self.profile_manager.config.profiles if p.usage_id == usage_id),
    None
)
if profile_name:
    llm_instance = self.profile_manager.load_profile(profile_name)
    self._usage_to_llm[usage_id] = llm_instance

if self.profile_manager is not None
else set()
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Backward compatibility concern.

This method now returns both in-memory AND on-disk usage IDs. This is a behavior change that could break existing code expecting only in-memory profiles. Consider:

  1. Documenting this change clearly in the docstring
  2. Mentioning it in the PR description as a breaking change
  3. Adding a parameter to control this behavior if needed for compatibility

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that there is a corner case where this can happens. However, the user can put load_from_disk = False and the code will return just in-memory profiles.

Note that the default for load_from_disk is False.

Comment on lines +101 to +107
@cache
def get_default_profile(self) -> LLM | None:
"""Return the default LLM profile."""
default_profile_name = self.config.default_profile
if default_profile_name is None:
return None
return self.load_profile(default_profile_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Cache invalidation is incomplete.

The @cache decorator caches the default profile, but the cache is only cleared in set_default_profile(). It should also be cleared when:

  • The default profile is deleted (currently forbidden, but good for future-proofing)
  • The default profile is updated via save_profile()

Consider clearing the cache in save_profile() and delete_profile() if they affect the default profile.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While delete_profile() doesn't require cache eviction, it’s best to defensively clear the cache in save_profile().


Note that the profile manager also uses a configuration file containing metadata
about the profiles. This configuration file is primarily used to lazily load
profiles from disk.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Concurrency not addressed.

The PR description mentions: "concurrency issues could arise" when multiple ProfileManager instances work in the same directory. While you note this is a corner case, consider:

  1. Adding a clear warning in the docstring
  2. At minimum, implementing file-based locking for config writes
  3. Or documenting that this is explicitly NOT thread-safe

This is especially important for CLI usage where users might run multiple commands concurrently.

Suggested docstring addition:

**Thread Safety**: This class is NOT thread-safe. Multiple ProfileManager
instances operating on the same directory concurrently may cause race conditions
when reading/writing the config file. Ensure only one instance per directory
is active at a time.

self.config_path.write_text(json.dumps(config.model_dump(), indent=2))

def _load_config(self) -> ProfilesConfig:
return ProfilesConfig.from_json(self.config_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's a config file necessary? We could maybe just load the profiles from their directory

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could be an option but I think this depends on what we want to solve and which user-case this class want to address.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I'm thinking:

  • we have generally avoided to make config files in the SDK
  • we do save/load things (conversations, events, hooks I think), though enabled by user choice - I believe saving/loading LLM profiles is fine
  • but I think maybe a config file could be left to applications, not in the SDK.

I'd be happy to be proven wrong though. There's an argument that doing it here would assure consistency between applications (CLI and web apps), but I don't know if this is the time to change that... WDYT?

"enable_encrypted_reasoning": true,
"prompt_cache_retention": "24h",
"extended_thinking_budget": 200000,
"usage_id": "gpt_agent",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be condenser or any other LLM

code_agent = Agent(llm=code_llm, tools=[Tool(name=TerminalTool.name)]) # noqa: F841

summary_llm = registry.get("gpt_agent")
summary_agent = Agent(llm=summary_llm, tools=[Tool(name=TerminalTool.name)]) # noqa: F841
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I think this is confusing in the codebase, but it seems to me that usage_ids are orthogonal to profiles. It's a different feature, and if we merge them into one I'm not sure we can achieve both

  • stats per "usage" (e.g. total stats for the agent, no matter what LLMs it used)
  • the user can choose any profile for agent

Did that make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had some thoughts about that too.

The problem is that there is no a 1-to-1 correspondence between usage_ids and profiles, i.e., you could have one profile with multiple usage ids.

For that reason the usage_id in the ProfileMetadata is optional. On the other hand, the profiles config asks for a list of profiles metadata with different names as the name is used to retrieve the correct profile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands please do a /codereview-roasted and look up ALL the usage of usage_ids (including all LLM-specific IDs), then zoom out a bit and understand the purpose of this PR. Do you think if the current implementation in this PR is a good idea? If not, what do you think would be the best approach?

Currently it seems with this PR, we will have LLMRegistry -> LLMProfile -> LLM and each of this concept have their own notion of "IDs". Any way to simplify this hierarchy? I imagine a profile is just a thin wrapper around an LLM (with a user-specified name), and LLMRegistry should just be a collection of LLM | LLMProfile - do you think if this is a good idea?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! xingyaoww can track my progress at all-hands.dev

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary: PR #1882 "feat: Persistent LLM profiles"

Taste Rating: 🔴 Needs Improvement

I performed a comprehensive code review of the PR and analyzed all usages of usage_id, profile_id, and related IDs across the codebase. Here's my assessment:


The ID Hierarchy Problem

The PR introduces a confusing hierarchy with multiple overlapping ID concepts:

ID Location Purpose
usage_id LLM Role identification (agent, condenser, etc.)
profile_id LLM (new) Persisted configuration name
registry_id LLMRegistry Instance UUID for logging

The core issue: These concepts are orthogonal but the PR conflates them. Specifically, ensure_default_profile() uses usage_id as the default profile_id, creating false coupling.


Critical Issues

  1. ensure_default_profile() conflates usage_id with profile_id - Multiple usages can share one profile (agent and condenser both using gpt-5-mini), and one usage can switch between profiles.

  2. profile_ref boolean flag is a code smell - If you need a flag to track how an object was created, your serialization logic is too complex.

  3. Manual discriminated union in LLM validator - The kind: profile_ref pattern is implemented manually, creating tight coupling between LLM and LLMRegistry.

  4. ConversationState breaks encapsulation - It manually patches serialized payloads after model_dump().


Recommended Simplification

Your intuition is correct: a profile should be a thin wrapper around an LLM. I'd go further:

A profile IS an LLM configuration stored on disk. Period.

# Proposed simpler API:
class LLMProfileStore:
    """Standalone utility for persisting LLM configurations."""
    def save(self, name: str, llm: LLM, include_secrets: bool = False) -> Path: ...
    def load(self, name: str) -> LLM: ...
    def list(self) -> list[str]: ...

# Registry stays simple - just in-memory by usage_id:
registry = LLMRegistry()
registry.add(llm)  # keyed by llm.usage_id

This keeps:

  • LLM simple (no profile_ref flag, no kind discriminator)
  • LLMRegistry simple (just in-memory lookup by usage_id)
  • Profile persistence explicit and separate

Verdict

Needs rework - The fundamental design conflates orthogonal concepts. The PR should:

  1. Keep usage_id and profile_id strictly separate - they serve different purposes
  2. Make profile persistence a standalone utility - not embedded in LLMRegistry or LLM
  3. Use explicit load/save rather than magic serialization - store.load("gpt-5-mini") is clearer than kind: profile_ref discriminators

Note: This was a review-only task. No code changes were made.

View full conversation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ I kinda like this idea - @VascoSch92 @enyst wdyt?

},
{
"name": "gpt4",
"file": "gpt4.json",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the rule "profile_id.json is the filename", then maybe we don't need a config file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The only point is: where we store the info about the default_profile, i.e., how do we know which one is the default profile?

@VascoSch92
Copy link
Author

Hey @enyst, thank you very much for the feedback!

Here are a couple of thoughts to help clarify the situation. I’d love to know what you think about the following:

Goal: We want a solution that allows the user to load or switch profiles on the fly.

Context: A solution proposed by @xingyaoww [here] suggested allowing the LLMRegistry to look into persistent files. This makes sense, as the LLMRegistry is used in LocalConversation to manage different profiles.

However, the LLMRegistry uses usage_id as a primary key, while the proposed ProfileManager uses name. This could lead to inconsistencies since there isn't a one-to-one mapping between the two.

A possible Solution: We could update the ProfileManager to work with two parameters: name and usage_id. This would mean:

We can save the same LLM with different usage_id values. (This should also work for saving Agents, and we could potentially remove the Config-related Pydantic models).

This would require updating the LLMRegistry API if we want the ProfileManager to support it. Specifically, the .get() method would now require the name of the profile as an additional parameter.

Makes sense?

@enyst
Copy link
Collaborator

enyst commented Feb 4, 2026

Thank you @VascoSch92 for thinking it through! To be honest, this can get complicated fast.

I think what matters is:

(start with) - agent -> gpt-5
(switch to) - agent-> sonnet-45
(at restore) - agent -> will be sonnet-45

where agent is usage_id, and those are profile ids. This will be within a single run, and we'd be able to compute total cost for agent, which gets displayed on conversations windows and so on.

If we can do that, we're good IMHO!

Now, the same goes for other buckets of usage.

My current conclusion playing with it, is that it's simpler to:

  • use only profiles, not LLMs in conversations code etc, so we should have a default profile no matter what the user does
  • use agent for agent, condenser for condenser, specific other components are also named
    • I did not allow for override of these names anymore
  • I think that implies to save profile_id in the conversation base_state.json, so that we can restore the profile. 🤔

Then, it seems to me, we have the question: how about remote?

  • currently, a client needs to send LLMs serialized (so the "contents" of the profile)
    • of course, since the remote is agent server, and it doesn't yet know profiles
  • I think maybe we could send profile id from the client, it seems necessary? e.g. user selects a profile -> start conversation / switch to that profile

)


def main() -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's break the main function into scripts directly and save a level of indentation.

@OpenHands please implement this change and /remember this preference in AGENTS.md

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! xingyaoww can track my progress at all-hands.dev

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

I've implemented the requested changes and pushed them to the agent-sdk-18-profile-manager branch:

Checklist

  • Removed if __name__ == "__main__": guard from examples/01_standalone_sdk/36_llm_profiles.py - The code now runs directly at module level, saving one level of indentation (consistent with other examples like 01_hello_world.py)
  • Documented the preference in AGENTS.md - Added a new "Example Scripts" section under <CODE> that states: "Example scripts in examples/ should run code directly at module level without wrapping in if __name__ == "__main__": guards. This saves a level of indentation and keeps examples concise."
  • Pre-commit checks passed - All linting, formatting, and type checks passed
  • Changes pushed to the PR branch

The changes are concise and focused only on what was requested. The PR has been updated with commit 572e7196.

View full conversation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants