Skip to content

[Feature]: Cache static endpoint responses #835

@aldbr

Description

@aldbr

User Story

As a DiracX operator,
I want well-known endpoints (/openid-configuration, /dirac-metadata) to be cached,
so that the server does not recompute identical responses on every request, reducing latency and load.

Feature Description

The .well-known endpoints (/openid-configuration, /dirac-metadata, /jwks.json, and /security.txt) rebuild their full response on every request by calling into the logic layer (get_openid_configuration_bl, get_installation_metadata_bl, get_jwks_bl). In practice, these responses only change when the DiracX configuration changes (tracked by its git hexsha).

For /openid-configuration in particular, each call iterates over all registries, VOs, groups, and properties to build the scopes_supported list — work that is repeated identically until the next config update.

A draft PR was opened in #457 introducing a cachetools.TTLCache at the router level. The approach works but has limitations worth discussing before finalising.

Scope

The following well-known endpoints are in scope for caching:

  • /openid-configuration — expensive, iterates over all VOs/groups/properties
  • /dirac-metadata — moderately expensive, iterates over all VOs/groups

The following are explicitly out of scope for server-side caching:

  • /jwks.json — returns public keys from settings, already lightweight (but could benefit from ETag headers at low cost)
  • /security.txt — static content, no computation involved

Note: /jwks.json is excluded from server-side caching but should still be considered for ETag-based HTTP caching (Option E), since the implementation cost is minimal and clients benefit from conditional requests.

Definition of Done

  • /openid-configuration and /dirac-metadata responses are cached and invalidated on config change
  • Extensions (e.g. gubbins) benefit from base-layer caching; full end-to-end caching in extensions is documented as a follow-up
  • Clients can avoid redundant requests for unchanged responses (via HTTP cache headers such as ETag or Cache-Control)
  • Tests cover cache invalidation on config change
  • Cache hit/miss is observable (e.g. debug-level logging on cache miss)
  • Code is reviewed and merged

Alternatives Considered

Option A — TTLCache at the router level (current draft PR #457)

A module-level TTLCache in well_known.py, keyed by f"openid-configuration:{config._hexsha}".

from cachetools import TTLCache

_static_cache: TTLCache = TTLCache(maxsize=4, ttl=300)

@router.get("/openid-configuration")
async def openid_configuration(..., config: Config):
    key = f"openid-configuration:{config._hexsha}"
    if key not in _static_cache:
        _static_cache[key] = await get_openid_configuration_bl(...)
    return _static_cache[key]
Pros Cons
Minimal change, consistent with existing TTLCache usage in the codebase (factory.py, auth/utils.py) Cache lives in the router layer — extensions like gubbins that override endpoints must replicate the caching
Easy to understand TTL-based expiry is arbitrary; cache may serve stale data briefly or expire unnecessarily

Note: maxsize=4 accommodates 2 endpoints x 2 hexsha values (old + new) during a config transition. A brief justification should accompany this value in the code.

Option B — Cache in the logic layer

Move the cache into diracx/logic/auth/well_known.py so that all callers (including extensions) benefit automatically.

# In diracx/logic/auth/well_known.py
_openid_config_cache: TTLCache = TTLCache(maxsize=4, ttl=300)

async def get_openid_configuration_bl(..., config):
    key = f"openid-config:{config._hexsha}"
    if key not in _openid_config_cache:
        result = ...  # build response
        _openid_config_cache[key] = result
    return _openid_config_cache[key]
Pros Cons
Extensions inherit caching of the base computation for free Logic layer gains a caching concern
Single place to maintain Does not fully solve extension caching (see caveat below)

Caveat on extension compatibility: Gubbins's get_installation_metadata() calls the base get_installation_metadata() and then adds user info on top. Caching the base function only avoids recomputing the base result, but gubbins still recomputes its additions on every request. For full end-to-end caching, gubbins would need its own cache around its wrapper, or a composable caching decorator would need to be applied at each layer. This option reduces but does not eliminate redundant computation in extensions. A follow-up issue should document a recommended pattern for extensions to participate in caching (e.g. a reusable hexsha_cached decorator).

Option C — Hexsha-keyed cache without TTL

Since the response is a pure function of config._hexsha, a TTL is not strictly needed. A simple dict replaced atomically on hexsha change would never serve stale data:

_config_cache: dict[str, Any] = {}

async def get_openid_configuration(..., config: Config):
    key = f"openid-config:{config._hexsha}"
    if key not in _config_cache:
        result = await get_openid_configuration_bl(...)
        # Replace the entire dict atomically instead of clear() + insert,
        # which avoids a race where another coroutine inserts a stale entry
        # between clear() and the await (which yields to the event loop).
        _config_cache = {key: result}
    return _config_cache[key]

Note on the original clear() + insert pattern: Using _config_cache.clear() followed by _config_cache[key] = await ... introduces a race condition. Between clear() and the await (which yields to the event loop), another coroutine could insert an entry for the old hexsha into the now-empty dict. Since there is no TTL or size bound, that stale entry would persist indefinitely. Replacing the dict atomically (_config_cache = {key: result}) avoids this entirely. Worst case with the atomic replacement is duplicate computation, not stale data.

Pros Cons
No cachetools dependency needed for this use case Concurrent requests during a config transition may duplicate computation (benign — no stale data). An asyncio.Lock could be used if even this is undesirable.
Cache is never stale — invalidation is deterministic Requires using global _config_cache or wrapping in a class to allow dict replacement

Option D — HTTP Cache-Control headers

Add Cache-Control headers so clients and reverse proxies also cache the response, on top of any server-side option above:

from starlette.responses import JSONResponse

@router.get("/openid-configuration")
async def openid_configuration(...):
    result = await get_openid_configuration_bl(...)
    return JSONResponse(content=result, headers={"Cache-Control": "public, max-age=300"})
Pros Cons
Standard HTTP mechanism, works with CDNs/proxies Does not reduce server-side compute on its own
Offloads caching to clients, zero server memory Clients may see stale data for up to max-age seconds after a config change

Note on max-age=300: The 300-second value is a placeholder. The appropriate value depends on how much staleness is acceptable to operators. Config changes in DiracX are typically infrequent (order of hours/days), so even 300s is conservative. This should be a configurable value or at least documented with rationale before merging.

Option E — ETag-based HTTP caching

Use config._hexsha as an ETag header, allowing clients to send If-None-Match and receive a 304 Not Modified when the config has not changed. This pattern is already used in the codebase for the /config/ endpoint (diracx-routers/src/diracx/routers/configuration.py).

from fastapi import Header, HTTPException, status

@router.get("/openid-configuration")
async def openid_configuration(
    ...,
    config: Config,
    if_none_match: str | None = Header(None),
):
    headers = {"ETag": config._hexsha}
    if if_none_match == config._hexsha:
        raise HTTPException(status_code=status.HTTP_304_NOT_MODIFIED, headers=headers)
    result = await get_openid_configuration_bl(...)
    return JSONResponse(content=result, headers=headers)
Pros Cons
Already a proven pattern in the DiracX codebase (/config/ endpoint) Still hits the server on every request (but avoids recomputing the body when combined with server-side cache)
Never serves stale data — invalidation is deterministic via hexsha Requires clients to support If-None-Match (all modern HTTP clients do)
No arbitrary TTL — clients always get current data or a 304

This can be combined with Option D (Cache-Control) for layered caching: ETag for correctness, max-age for reducing request frequency.

Recommendation

Options are not mutually exclusive. A combination of B (logic-layer cache) + C (hexsha-keyed, no TTL) + E (ETag headers) would give deterministic invalidation, partial extension compatibility (base computation is cached; extensions still recompute their own additions), and client-side caching without serving stale data. Option D (Cache-Control) can be added on top if reducing request frequency is also a goal, at the cost of briefly stale responses.

To fully address extension caching, a follow-up should provide a reusable hexsha_cached decorator or similar pattern that extensions like gubbins can apply to their own wrapper functions.

Any single option is already an improvement over the current state.

Related Issues

Additional Context

  • The codebase already uses cachetools.TTLCache in two places: factory.py (DB ping, ttl=10s) and auth/utils.py (IAM server metadata, ttl=3600s), so the pattern is established.
  • The /config/ endpoint already implements ETag + 304 Not Modified caching using config._hexsha (via raise HTTPException(status_code=304, headers=headers)). Starlette's default exception handler special-cases 304 and 204 to return a response with no body, so this pattern is HTTP-compliant. It also supports If-Modified-Since with config._modified as a secondary cache control mechanism.
  • FastAPI caches the OpenAPI schema separately via self.openapi_schema in DiracFastAPI.openapi(). This is a distinct mechanism and does not affect the well-known endpoints.
  • The gubbins extension overrides /dirac-metadata with its own route (gubbins/routers/well_known.py) and layers additional logic on top of the base get_installation_metadata(), so neither router-level nor logic-layer caching fully covers the extension without additional work in gubbins.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions