-
Notifications
You must be signed in to change notification settings - Fork 35
Description
User Story
As a DiracX operator,
I want well-known endpoints (/openid-configuration, /dirac-metadata) to be cached,
so that the server does not recompute identical responses on every request, reducing latency and load.
Feature Description
The .well-known endpoints (/openid-configuration, /dirac-metadata, /jwks.json, and /security.txt) rebuild their full response on every request by calling into the logic layer (get_openid_configuration_bl, get_installation_metadata_bl, get_jwks_bl). In practice, these responses only change when the DiracX configuration changes (tracked by its git hexsha).
For /openid-configuration in particular, each call iterates over all registries, VOs, groups, and properties to build the scopes_supported list — work that is repeated identically until the next config update.
A draft PR was opened in #457 introducing a cachetools.TTLCache at the router level. The approach works but has limitations worth discussing before finalising.
Scope
The following well-known endpoints are in scope for caching:
/openid-configuration— expensive, iterates over all VOs/groups/properties/dirac-metadata— moderately expensive, iterates over all VOs/groups
The following are explicitly out of scope for server-side caching:
/jwks.json— returns public keys from settings, already lightweight (but could benefit from ETag headers at low cost)/security.txt— static content, no computation involved
Note:
/jwks.jsonis excluded from server-side caching but should still be considered for ETag-based HTTP caching (Option E), since the implementation cost is minimal and clients benefit from conditional requests.
Definition of Done
-
/openid-configurationand/dirac-metadataresponses are cached and invalidated on config change - Extensions (e.g. gubbins) benefit from base-layer caching; full end-to-end caching in extensions is documented as a follow-up
- Clients can avoid redundant requests for unchanged responses (via HTTP cache headers such as ETag or Cache-Control)
- Tests cover cache invalidation on config change
- Cache hit/miss is observable (e.g. debug-level logging on cache miss)
- Code is reviewed and merged
Alternatives Considered
Option A — TTLCache at the router level (current draft PR #457)
A module-level TTLCache in well_known.py, keyed by f"openid-configuration:{config._hexsha}".
from cachetools import TTLCache
_static_cache: TTLCache = TTLCache(maxsize=4, ttl=300)
@router.get("/openid-configuration")
async def openid_configuration(..., config: Config):
key = f"openid-configuration:{config._hexsha}"
if key not in _static_cache:
_static_cache[key] = await get_openid_configuration_bl(...)
return _static_cache[key]| Pros | Cons |
|---|---|
Minimal change, consistent with existing TTLCache usage in the codebase (factory.py, auth/utils.py) |
Cache lives in the router layer — extensions like gubbins that override endpoints must replicate the caching |
| Easy to understand | TTL-based expiry is arbitrary; cache may serve stale data briefly or expire unnecessarily |
Note:
maxsize=4accommodates 2 endpoints x 2 hexsha values (old + new) during a config transition. A brief justification should accompany this value in the code.
Option B — Cache in the logic layer
Move the cache into diracx/logic/auth/well_known.py so that all callers (including extensions) benefit automatically.
# In diracx/logic/auth/well_known.py
_openid_config_cache: TTLCache = TTLCache(maxsize=4, ttl=300)
async def get_openid_configuration_bl(..., config):
key = f"openid-config:{config._hexsha}"
if key not in _openid_config_cache:
result = ... # build response
_openid_config_cache[key] = result
return _openid_config_cache[key]| Pros | Cons |
|---|---|
| Extensions inherit caching of the base computation for free | Logic layer gains a caching concern |
| Single place to maintain | Does not fully solve extension caching (see caveat below) |
Caveat on extension compatibility: Gubbins's
get_installation_metadata()calls the baseget_installation_metadata()and then adds user info on top. Caching the base function only avoids recomputing the base result, but gubbins still recomputes its additions on every request. For full end-to-end caching, gubbins would need its own cache around its wrapper, or a composable caching decorator would need to be applied at each layer. This option reduces but does not eliminate redundant computation in extensions. A follow-up issue should document a recommended pattern for extensions to participate in caching (e.g. a reusablehexsha_cacheddecorator).
Option C — Hexsha-keyed cache without TTL
Since the response is a pure function of config._hexsha, a TTL is not strictly needed. A simple dict replaced atomically on hexsha change would never serve stale data:
_config_cache: dict[str, Any] = {}
async def get_openid_configuration(..., config: Config):
key = f"openid-config:{config._hexsha}"
if key not in _config_cache:
result = await get_openid_configuration_bl(...)
# Replace the entire dict atomically instead of clear() + insert,
# which avoids a race where another coroutine inserts a stale entry
# between clear() and the await (which yields to the event loop).
_config_cache = {key: result}
return _config_cache[key]Note on the original
clear()+ insert pattern: Using_config_cache.clear()followed by_config_cache[key] = await ...introduces a race condition. Betweenclear()and theawait(which yields to the event loop), another coroutine could insert an entry for the old hexsha into the now-empty dict. Since there is no TTL or size bound, that stale entry would persist indefinitely. Replacing the dict atomically (_config_cache = {key: result}) avoids this entirely. Worst case with the atomic replacement is duplicate computation, not stale data.
| Pros | Cons |
|---|---|
No cachetools dependency needed for this use case |
Concurrent requests during a config transition may duplicate computation (benign — no stale data). An asyncio.Lock could be used if even this is undesirable. |
| Cache is never stale — invalidation is deterministic | Requires using global _config_cache or wrapping in a class to allow dict replacement |
Option D — HTTP Cache-Control headers
Add Cache-Control headers so clients and reverse proxies also cache the response, on top of any server-side option above:
from starlette.responses import JSONResponse
@router.get("/openid-configuration")
async def openid_configuration(...):
result = await get_openid_configuration_bl(...)
return JSONResponse(content=result, headers={"Cache-Control": "public, max-age=300"})| Pros | Cons |
|---|---|
| Standard HTTP mechanism, works with CDNs/proxies | Does not reduce server-side compute on its own |
| Offloads caching to clients, zero server memory | Clients may see stale data for up to max-age seconds after a config change |
Note on
max-age=300: The 300-second value is a placeholder. The appropriate value depends on how much staleness is acceptable to operators. Config changes in DiracX are typically infrequent (order of hours/days), so even 300s is conservative. This should be a configurable value or at least documented with rationale before merging.
Option E — ETag-based HTTP caching
Use config._hexsha as an ETag header, allowing clients to send If-None-Match and receive a 304 Not Modified when the config has not changed. This pattern is already used in the codebase for the /config/ endpoint (diracx-routers/src/diracx/routers/configuration.py).
from fastapi import Header, HTTPException, status
@router.get("/openid-configuration")
async def openid_configuration(
...,
config: Config,
if_none_match: str | None = Header(None),
):
headers = {"ETag": config._hexsha}
if if_none_match == config._hexsha:
raise HTTPException(status_code=status.HTTP_304_NOT_MODIFIED, headers=headers)
result = await get_openid_configuration_bl(...)
return JSONResponse(content=result, headers=headers)| Pros | Cons |
|---|---|
Already a proven pattern in the DiracX codebase (/config/ endpoint) |
Still hits the server on every request (but avoids recomputing the body when combined with server-side cache) |
| Never serves stale data — invalidation is deterministic via hexsha | Requires clients to support If-None-Match (all modern HTTP clients do) |
| No arbitrary TTL — clients always get current data or a 304 |
This can be combined with Option D (
Cache-Control) for layered caching: ETag for correctness,max-agefor reducing request frequency.
Recommendation
Options are not mutually exclusive. A combination of B (logic-layer cache) + C (hexsha-keyed, no TTL) + E (ETag headers) would give deterministic invalidation, partial extension compatibility (base computation is cached; extensions still recompute their own additions), and client-side caching without serving stale data. Option D (Cache-Control) can be added on top if reducing request frequency is also a goal, at the cost of briefly stale responses.
To fully address extension caching, a follow-up should provide a reusable hexsha_cached decorator or similar pattern that extensions like gubbins can apply to their own wrapper functions.
Any single option is already an improvement over the current state.
Related Issues
- Draft PR: Cache static routes #457
Additional Context
- The codebase already uses
cachetools.TTLCachein two places:factory.py(DB ping, ttl=10s) andauth/utils.py(IAM server metadata, ttl=3600s), so the pattern is established. - The
/config/endpoint already implements ETag +304 Not Modifiedcaching usingconfig._hexsha(viaraise HTTPException(status_code=304, headers=headers)). Starlette's default exception handler special-cases 304 and 204 to return a response with no body, so this pattern is HTTP-compliant. It also supportsIf-Modified-Sincewithconfig._modifiedas a secondary cache control mechanism. - FastAPI caches the OpenAPI schema separately via
self.openapi_schemainDiracFastAPI.openapi(). This is a distinct mechanism and does not affect the well-known endpoints. - The gubbins extension overrides
/dirac-metadatawith its own route (gubbins/routers/well_known.py) and layers additional logic on top of the baseget_installation_metadata(), so neither router-level nor logic-layer caching fully covers the extension without additional work in gubbins.