-
Notifications
You must be signed in to change notification settings - Fork 78
chore: cache proto context to optimize generator performance #2531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,6 +13,8 @@ | |
| # limitations under the License. | ||
|
|
||
| import functools | ||
| import contextlib | ||
| import threading | ||
|
|
||
|
|
||
| def cached_property(fx): | ||
|
|
@@ -43,3 +45,91 @@ def inner(self): | |
| return self._cached_values[fx.__name__] | ||
|
|
||
| return property(inner) | ||
|
|
||
|
|
||
| # Thread-local storage for the simple cache dictionary. | ||
| # This ensures that parallel generation tasks (if any) do not corrupt each other's cache. | ||
| _thread_local = threading.local() | ||
|
|
||
|
|
||
| @contextlib.contextmanager | ||
| def generation_cache_context(): | ||
| """Context manager to explicitly manage the lifecycle of the generation cache. | ||
|
|
||
| This manager initializes a fresh dictionary in thread-local storage when entering | ||
| the context and strictly deletes it when exiting. | ||
|
|
||
| **Memory Management:** | ||
| The cache stores strong references to Proto objects to "pin" them in memory | ||
| (see `cached_proto_context`). It is critical that this context manager deletes | ||
| the dictionary in the `finally` block. Deleting the dictionary breaks the | ||
| reference chain, allowing Python's Garbage Collector to finally free all the | ||
| large Proto objects that were pinned during generation. | ||
| """ | ||
| # Initialize the cache as a standard dictionary. | ||
| _thread_local.cache = {} | ||
| try: | ||
| yield | ||
| finally: | ||
| # Delete the dictionary to free all memory and pinned objects. | ||
| # This is essential to prevent memory leaks in long-running processes. | ||
| del _thread_local.cache | ||
|
|
||
|
|
||
| def cached_proto_context(func): | ||
| """Decorator to memoize `with_context` calls based on object identity and collisions. | ||
|
|
||
| This mechanism provides a significant performance boost by preventing | ||
| redundant recalculations of naming collisions during template rendering. | ||
|
|
||
| Since the Proto wrapper objects are unhashable (mutable), we use `id(self)` as | ||
| the primary cache key. Normally, this is dangerous: if the object is garbage | ||
| collected, Python might reuse its memory address for a *new* object, leading to | ||
| a cache collision (the "Zombie ID" bug). | ||
|
|
||
| To prevent this, this decorator stores the value as a tuple: `(result, self)`. | ||
| By keeping a reference to `self` in the cache value, we "pin" the object in | ||
| memory. This forces the Garbage Collector to keep the object alive, guaranteeing | ||
| that `id(self)` remains unique for the entire lifespan of the `generation_cache_context`. | ||
|
|
||
| Args: | ||
| func (Callable): The function to decorate (usually `with_context`). | ||
|
|
||
| Returns: | ||
| Callable: The wrapped function with caching and pinning logic. | ||
| """ | ||
|
|
||
| @functools.wraps(func) | ||
| def wrapper(self, *, collisions, **kwargs): | ||
|
|
||
| # 1. Check for active cache (returns None if context is not active) | ||
| context_cache = getattr(_thread_local, "cache", None) | ||
|
|
||
| # If we are not inside a generation_cache_context (e.g. unit tests), | ||
| # bypass the cache entirely. | ||
| if context_cache is None: | ||
| return func(self, collisions=collisions, **kwargs) | ||
|
|
||
| # 2. Create the cache key | ||
| # We use frozenset for collisions to make it hashable. | ||
| # We use id(self) because 'self' is not hashable. | ||
| collisions_key = frozenset(collisions) if collisions else None | ||
| key = (id(self), collisions_key) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seeing this, my first thought was that this cached state should probably be associated with each instance instead of being global, since it's keyed by the instance anyway. But looking at the code more, it looks like each |
||
|
|
||
| # 3. Check Cache | ||
| if key in context_cache: | ||
| # The cache stores (result, pinned_object). We return just the result. | ||
| return context_cache[key][0] | ||
|
|
||
| # 4. Execute the actual function | ||
| # We ensure context_cache is passed down to the recursive calls | ||
| result = func(self, collisions=collisions, **kwargs) | ||
|
|
||
| # 5. Update Cache & Pin Object | ||
| # We store (result, self). The reference to 'self' prevents garbage collection, | ||
| # ensuring that 'id(self)' cannot be reused for a new object while this | ||
| # cache entry exists. | ||
| context_cache[key] = (result, self) | ||
| return result | ||
|
|
||
| return wrapper | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it might be better to use a more unique name here, to avoid collisions