Conversation
1dc3595 to
56cc73c
Compare
all-hands-bot
left a comment
There was a problem hiding this comment.
Review Summary
This PR introduces a solid foundation for persistent LLM profiles. The implementation is generally well-structured with good test coverage. However, there are several issues that should be addressed:
Critical Issues:
- Profile files committed to repository root should be relocated
- Missing error handling in lazy loading logic could cause incorrect exceptions
Important Issues:
- Deprecated property still exposes mutable internal state
- Incomplete cache invalidation
- Backward compatibility concerns with list_usage_ids()
See inline comments for details.
| removed_in="1.7.0", | ||
| details="The method is deprecated and will be removed in next versions", | ||
| ) | ||
| def usage_to_llm(self) -> dict[str, LLM]: |
There was a problem hiding this comment.
🟠 Important: Deprecated property returns mutable internal state.
The deprecation note says "a copy or proxy map should be returned" but the implementation still returns the internal dict directly. This allows callers to mutate _usage_to_llm.
Also, the deprecation message says "The method is deprecated" but this is a property, not a method.
| def usage_to_llm(self) -> dict[str, LLM]: | |
| @deprecated( | |
| deprecated_in="1.6.0", | |
| removed_in="1.7.0", | |
| details="This property is deprecated and will be removed in future versions. Use list_usage_ids() and get() instead.", | |
| ) | |
| def usage_to_llm(self) -> dict[str, LLM]: | |
| """Access a copy of the internal usage-ID-to-LLM mapping.""" | |
| return self._usage_to_llm.copy() |
There was a problem hiding this comment.
For clarity, usage_id is an id used for the LLM when computing conversation stats, e.g. we wanted to be able to say "the agent LLM used 120k tokens total", "the condenser LLM used 45k tokens total".
I think they're not profile ids 🤔 WDYT?
There was a problem hiding this comment.
My point was that allowing an external caller to change an internal state breaks the API contract.
Could you point me to an example where this property is used? I searched the repo and it seems it's never referenced.
If you still want to keep this property, I suggest returning:
- a copy of the dictionary (ok but not ideal)
- a
MappingProxyType(better solution)
There was a problem hiding this comment.
Ah ok, I don't think we care about the property, if it's not used let's kill it!
| llm_instance = self._usage_to_llm[usage_id] | ||
| elif self.profile_manager and usage_id in self.profile_manager.get_id_usages(): | ||
| profile_name = next( | ||
| p.name | ||
| for p in self.profile_manager.config.profiles | ||
| if p.usage_id == usage_id | ||
| ) | ||
| llm_instance = self.profile_manager.load_profile(profile_name) |
There was a problem hiding this comment.
🔴 Critical: Missing error handling for lazy loading.
If no profile matches the usage_id, this next() call will raise StopIteration instead of the intended KeyError. The in check above should prevent this, but it's better to be defensive.
Consider using:
profile_name = next(
(p.name for p in self.profile_manager.config.profiles if p.usage_id == usage_id),
None
)
if profile_name:
llm_instance = self.profile_manager.load_profile(profile_name)
self._usage_to_llm[usage_id] = llm_instance| if self.profile_manager is not None | ||
| else set() | ||
| ) | ||
|
|
There was a problem hiding this comment.
🟠 Important: Backward compatibility concern.
This method now returns both in-memory AND on-disk usage IDs. This is a behavior change that could break existing code expecting only in-memory profiles. Consider:
- Documenting this change clearly in the docstring
- Mentioning it in the PR description as a breaking change
- Adding a parameter to control this behavior if needed for compatibility
There was a problem hiding this comment.
I see that there is a corner case where this can happens. However, the user can put load_from_disk = False and the code will return just in-memory profiles.
Note that the default for load_from_disk is False.
| @cache | ||
| def get_default_profile(self) -> LLM | None: | ||
| """Return the default LLM profile.""" | ||
| default_profile_name = self.config.default_profile | ||
| if default_profile_name is None: | ||
| return None | ||
| return self.load_profile(default_profile_name) |
There was a problem hiding this comment.
🟡 Suggestion: Cache invalidation is incomplete.
The @cache decorator caches the default profile, but the cache is only cleared in set_default_profile(). It should also be cleared when:
- The default profile is deleted (currently forbidden, but good for future-proofing)
- The default profile is updated via
save_profile()
Consider clearing the cache in save_profile() and delete_profile() if they affect the default profile.
There was a problem hiding this comment.
While delete_profile() doesn't require cache eviction, it’s best to defensively clear the cache in save_profile().
|
|
||
| Note that the profile manager also uses a configuration file containing metadata | ||
| about the profiles. This configuration file is primarily used to lazily load | ||
| profiles from disk. |
There was a problem hiding this comment.
🟠 Important: Concurrency not addressed.
The PR description mentions: "concurrency issues could arise" when multiple ProfileManager instances work in the same directory. While you note this is a corner case, consider:
- Adding a clear warning in the docstring
- At minimum, implementing file-based locking for config writes
- Or documenting that this is explicitly NOT thread-safe
This is especially important for CLI usage where users might run multiple commands concurrently.
Suggested docstring addition:
**Thread Safety**: This class is NOT thread-safe. Multiple ProfileManager
instances operating on the same directory concurrently may cause race conditions
when reading/writing the config file. Ensure only one instance per directory
is active at a time.
| self.config_path.write_text(json.dumps(config.model_dump(), indent=2)) | ||
|
|
||
| def _load_config(self) -> ProfilesConfig: | ||
| return ProfilesConfig.from_json(self.config_path) |
There was a problem hiding this comment.
I wonder if it's a config file necessary? We could maybe just load the profiles from their directory
There was a problem hiding this comment.
Yes, this could be an option but I think this depends on what we want to solve and which user-case this class want to address.
There was a problem hiding this comment.
Here's what I'm thinking:
- we have generally avoided to make config files in the SDK
- we do save/load things (conversations, events, hooks I think), though enabled by user choice - I believe saving/loading LLM profiles is fine
- but I think maybe a config file could be left to applications, not in the SDK.
I'd be happy to be proven wrong though. There's an argument that doing it here would assure consistency between applications (CLI and web apps), but I don't know if this is the time to change that... WDYT?
| "enable_encrypted_reasoning": true, | ||
| "prompt_cache_retention": "24h", | ||
| "extended_thinking_budget": 200000, | ||
| "usage_id": "gpt_agent", |
There was a problem hiding this comment.
I think this could be condenser or any other LLM
| code_agent = Agent(llm=code_llm, tools=[Tool(name=TerminalTool.name)]) # noqa: F841 | ||
|
|
||
| summary_llm = registry.get("gpt_agent") | ||
| summary_agent = Agent(llm=summary_llm, tools=[Tool(name=TerminalTool.name)]) # noqa: F841 |
There was a problem hiding this comment.
I see, I think this is confusing in the codebase, but it seems to me that usage_ids are orthogonal to profiles. It's a different feature, and if we merge them into one I'm not sure we can achieve both
- stats per "usage" (e.g. total stats for the agent, no matter what LLMs it used)
- the user can choose any profile for agent
Did that make sense?
There was a problem hiding this comment.
Yes, I had some thoughts about that too.
The problem is that there is no a 1-to-1 correspondence between usage_ids and profiles, i.e., you could have one profile with multiple usage ids.
For that reason the usage_id in the ProfileMetadata is optional. On the other hand, the profiles config asks for a list of profiles metadata with different names as the name is used to retrieve the correct profile.
There was a problem hiding this comment.
@OpenHands please do a /codereview-roasted and look up ALL the usage of usage_ids (including all LLM-specific IDs), then zoom out a bit and understand the purpose of this PR. Do you think if the current implementation in this PR is a good idea? If not, what do you think would be the best approach?
Currently it seems with this PR, we will have LLMRegistry -> LLMProfile -> LLM and each of this concept have their own notion of "IDs". Any way to simplify this hierarchy? I imagine a profile is just a thin wrapper around an LLM (with a user-specified name), and LLMRegistry should just be a collection of LLM | LLMProfile - do you think if this is a good idea?
There was a problem hiding this comment.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Code Review Summary: PR #1882 "feat: Persistent LLM profiles"
Taste Rating: 🔴 Needs Improvement
I performed a comprehensive code review of the PR and analyzed all usages of usage_id, profile_id, and related IDs across the codebase. Here's my assessment:
The ID Hierarchy Problem
The PR introduces a confusing hierarchy with multiple overlapping ID concepts:
| ID | Location | Purpose |
|---|---|---|
usage_id |
LLM |
Role identification (agent, condenser, etc.) |
profile_id |
LLM (new) |
Persisted configuration name |
registry_id |
LLMRegistry |
Instance UUID for logging |
The core issue: These concepts are orthogonal but the PR conflates them. Specifically, ensure_default_profile() uses usage_id as the default profile_id, creating false coupling.
Critical Issues
-
ensure_default_profile()conflatesusage_idwithprofile_id- Multiple usages can share one profile (agent and condenser both using gpt-5-mini), and one usage can switch between profiles. -
profile_refboolean flag is a code smell - If you need a flag to track how an object was created, your serialization logic is too complex. -
Manual discriminated union in
LLMvalidator - Thekind: profile_refpattern is implemented manually, creating tight coupling betweenLLMandLLMRegistry. -
ConversationStatebreaks encapsulation - It manually patches serialized payloads aftermodel_dump().
Recommended Simplification
Your intuition is correct: a profile should be a thin wrapper around an LLM. I'd go further:
A profile IS an LLM configuration stored on disk. Period.
# Proposed simpler API:
class LLMProfileStore:
"""Standalone utility for persisting LLM configurations."""
def save(self, name: str, llm: LLM, include_secrets: bool = False) -> Path: ...
def load(self, name: str) -> LLM: ...
def list(self) -> list[str]: ...
# Registry stays simple - just in-memory by usage_id:
registry = LLMRegistry()
registry.add(llm) # keyed by llm.usage_idThis keeps:
LLMsimple (noprofile_refflag, nokinddiscriminator)LLMRegistrysimple (just in-memory lookup byusage_id)- Profile persistence explicit and separate
Verdict
❌ Needs rework - The fundamental design conflates orthogonal concepts. The PR should:
- Keep
usage_idandprofile_idstrictly separate - they serve different purposes - Make profile persistence a standalone utility - not embedded in
LLMRegistryorLLM - Use explicit load/save rather than magic serialization -
store.load("gpt-5-mini")is clearer thankind: profile_refdiscriminators
Note: This was a review-only task. No code changes were made.
There was a problem hiding this comment.
^ I kinda like this idea - @VascoSch92 @enyst wdyt?
| }, | ||
| { | ||
| "name": "gpt4", | ||
| "file": "gpt4.json", |
There was a problem hiding this comment.
If we use the rule "profile_id.json is the filename", then maybe we don't need a config file?
There was a problem hiding this comment.
Yes. The only point is: where we store the info about the default_profile, i.e., how do we know which one is the default profile?
|
Hey @enyst, thank you very much for the feedback! Here are a couple of thoughts to help clarify the situation. I’d love to know what you think about the following: Goal: We want a solution that allows the user to load or switch profiles on the fly. Context: A solution proposed by @xingyaoww [here] suggested allowing the However, the A possible Solution: We could update the We can save the same LLM with different This would require updating the Makes sense? |
|
Thank you @VascoSch92 for thinking it through! To be honest, this can get complicated fast. I think what matters is:
where If we can do that, we're good IMHO! Now, the same goes for other buckets of My current conclusion playing with it, is that it's simpler to:
Then, it seems to me, we have the question: how about remote?
|
| ) | ||
|
|
||
|
|
||
| def main() -> None: |
There was a problem hiding this comment.
Let's break the main function into scripts directly and save a level of indentation.
@OpenHands please implement this change and /remember this preference in AGENTS.md
There was a problem hiding this comment.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Summary
I've implemented the requested changes and pushed them to the agent-sdk-18-profile-manager branch:
Checklist
- Removed
if __name__ == "__main__":guard fromexamples/01_standalone_sdk/36_llm_profiles.py- The code now runs directly at module level, saving one level of indentation (consistent with other examples like01_hello_world.py) - Documented the preference in AGENTS.md - Added a new "Example Scripts" section under
<CODE>that states: "Example scripts inexamples/should run code directly at module level without wrapping inif __name__ == "__main__":guards. This saves a level of indentation and keeps examples concise." - Pre-commit checks passed - All linting, formatting, and type checks passed
- Changes pushed to the PR branch
The changes are concise and focused only on what was requested. The PR has been updated with commit 572e7196.
Summary
The PR propose a solution for issue #781
The class
ProfileManageris introduced, whose sole job is to save, store, and update profiles on disk. Every time we work with persistent profiles, it should be done through this API to ensure a consistent and reliable way to store, load, and update profiles.The class can then be used to implement a profile selection solution for the
Openhands-CLIand inLLMRegistryto save or load registries coming from different experiments or runs.Changed Script
manager.py
The script contains the
ProfileManagerclass, which has the following characteristics:ProfileManagerinstances working in the same directory (e.g.,Openhands-CLI). Otherwise, concurrency issues could arise. Since I assume these are mostly corner cases, I haven't implemented any countermeasures yet. Does that make sense?config.py
The script contains
Pydanticmodels used to validate the configuration file associated with a profile store. These models are backward compatible.llm_registry.py
Several modifications were made to the
LLMRegistryclass; however, the API of the class did not change, ensuring backward compatibility. Shortly:Two new parameters were added:
load_from_diskandprofile_manager_path.load_from_diskis a boolean flag that determines whether to use profiles persisted on disk.load_from_diskis True, profile_manager_path specifies the location of the profiles.Two variables were chosen instead of one because of the default path for the profile manager.
The property
usage_to_llmis deprecated. Currently, it returns the internal state_usage_to_llmrather than a copy, which is not ideal. If access is needed, a copy or proxy map should be returned. The property was used only in a few tests, and since we do not want to change the API, it was simply deprecated.The
getmethod now lazily loads profiles from the store when necessary.A new method,
export_profiles, was added to save the registry. This is very helpful when experimenting or sharing LLM profiles between coworkers.Checklist