[Feature]: Cache static endpoint responses

### User Story

As a DiracX operator,
I want well-known endpoints (`/openid-configuration`, `/dirac-metadata`) to be cached,
so that the server does not recompute identical responses on every request, reducing latency and load.

### Feature Description

The `.well-known` endpoints (`/openid-configuration`, `/dirac-metadata`, `/jwks.json`, and `/security.txt`) rebuild their full response on every request by calling into the logic layer (`get_openid_configuration_bl`, `get_installation_metadata_bl`, `get_jwks_bl`). In practice, these responses only change when the DiracX configuration changes (tracked by its git hexsha).

For `/openid-configuration` in particular, each call iterates over all registries, VOs, groups, and properties to build the `scopes_supported` list — work that is repeated identically until the next config update.

A draft PR was opened in https://github.com/DIRACGrid/diracx/pull/457 introducing a `cachetools.TTLCache` at the router level. The approach works but has limitations worth discussing before finalising.

### Scope

The following well-known endpoints are in scope for caching:

- `/openid-configuration` — expensive, iterates over all VOs/groups/properties
- `/dirac-metadata` — moderately expensive, iterates over all VOs/groups

The following are explicitly **out of scope for server-side caching**:

- `/jwks.json` — returns public keys from settings, already lightweight (but could benefit from ETag headers at low cost)
- `/security.txt` — static content, no computation involved

> **Note:** `/jwks.json` is excluded from server-side caching but should still be considered for ETag-based HTTP caching (Option E), since the implementation cost is minimal and clients benefit from conditional requests.

### Definition of Done

- [ ] `/openid-configuration` and `/dirac-metadata` responses are cached and invalidated on config change
- [ ] Extensions (e.g. gubbins) benefit from base-layer caching; full end-to-end caching in extensions is documented as a follow-up
- [ ] Clients can avoid redundant requests for unchanged responses (via HTTP cache headers such as ETag or Cache-Control)
- [ ] Tests cover cache invalidation on config change
- [ ] Cache hit/miss is observable (e.g. debug-level logging on cache miss)
- [ ] Code is reviewed and merged

### Alternatives Considered

### Option A — TTLCache at the router level (current draft PR #457)

A module-level `TTLCache` in `well_known.py`, keyed by `f"openid-configuration:{config._hexsha}"`.

```python
from cachetools import TTLCache

_static_cache: TTLCache = TTLCache(maxsize=4, ttl=300)

@router.get("/openid-configuration")
async def openid_configuration(..., config: Config):
    key = f"openid-configuration:{config._hexsha}"
    if key not in _static_cache:
        _static_cache[key] = await get_openid_configuration_bl(...)
    return _static_cache[key]
```

| Pros | Cons |
|------|------|
| Minimal change, consistent with existing `TTLCache` usage in the codebase (`factory.py`, `auth/utils.py`) | Cache lives in the router layer — extensions like gubbins that override endpoints must replicate the caching |
| Easy to understand | TTL-based expiry is arbitrary; cache may serve stale data briefly or expire unnecessarily |

> **Note:** `maxsize=4` accommodates 2 endpoints x 2 hexsha values (old + new) during a config transition. A brief justification should accompany this value in the code.

### Option B — Cache in the logic layer

Move the cache into `diracx/logic/auth/well_known.py` so that all callers (including extensions) benefit automatically.

```python
# In diracx/logic/auth/well_known.py
_openid_config_cache: TTLCache = TTLCache(maxsize=4, ttl=300)

async def get_openid_configuration_bl(..., config):
    key = f"openid-config:{config._hexsha}"
    if key not in _openid_config_cache:
        result = ...  # build response
        _openid_config_cache[key] = result
    return _openid_config_cache[key]
```

| Pros | Cons |
|------|------|
| Extensions inherit caching of the base computation for free | Logic layer gains a caching concern |
| Single place to maintain | Does **not** fully solve extension caching (see caveat below) |

> **Caveat on extension compatibility:** Gubbins's `get_installation_metadata()` calls the base `get_installation_metadata()` and then adds user info on top. Caching the base function only avoids recomputing the base result, but gubbins still recomputes its additions on every request. For full end-to-end caching, gubbins would need its own cache around its wrapper, or a composable caching decorator would need to be applied at each layer. **This option reduces but does not eliminate redundant computation in extensions.** A follow-up issue should document a recommended pattern for extensions to participate in caching (e.g. a reusable `hexsha_cached` decorator).

### Option C — Hexsha-keyed cache without TTL

Since the response is a pure function of `config._hexsha`, a TTL is not strictly needed. A simple dict replaced atomically on hexsha change would never serve stale data:

```python
_config_cache: dict[str, Any] = {}

async def get_openid_configuration(..., config: Config):
    key = f"openid-config:{config._hexsha}"
    if key not in _config_cache:
        result = await get_openid_configuration_bl(...)
        # Replace the entire dict atomically instead of clear() + insert,
        # which avoids a race where another coroutine inserts a stale entry
        # between clear() and the await (which yields to the event loop).
        _config_cache = {key: result}
    return _config_cache[key]
```

> **Note on the original `clear()` + insert pattern:** Using `_config_cache.clear()` followed by `_config_cache[key] = await ...` introduces a race condition. Between `clear()` and the `await` (which yields to the event loop), another coroutine could insert an entry for the **old** hexsha into the now-empty dict. Since there is no TTL or size bound, that stale entry would persist indefinitely. Replacing the dict atomically (`_config_cache = {key: result}`) avoids this entirely. Worst case with the atomic replacement is duplicate computation, not stale data.

| Pros | Cons |
|------|------|
| No `cachetools` dependency needed for this use case | Concurrent requests during a config transition may duplicate computation (benign — no stale data). An `asyncio.Lock` could be used if even this is undesirable. |
| Cache is never stale — invalidation is deterministic | Requires using `global _config_cache` or wrapping in a class to allow dict replacement |

### Option D — HTTP Cache-Control headers

Add `Cache-Control` headers so clients and reverse proxies also cache the response, on top of any server-side option above:

```python
from starlette.responses import JSONResponse

@router.get("/openid-configuration")
async def openid_configuration(...):
    result = await get_openid_configuration_bl(...)
    return JSONResponse(content=result, headers={"Cache-Control": "public, max-age=300"})
```

| Pros | Cons |
|------|------|
| Standard HTTP mechanism, works with CDNs/proxies | Does not reduce server-side compute on its own |
| Offloads caching to clients, zero server memory | Clients may see stale data for up to `max-age` seconds after a config change |

> **Note on `max-age=300`:** The 300-second value is a placeholder. The appropriate value depends on how much staleness is acceptable to operators. Config changes in DiracX are typically infrequent (order of hours/days), so even 300s is conservative. This should be a configurable value or at least documented with rationale before merging.

### Option E — ETag-based HTTP caching

Use `config._hexsha` as an `ETag` header, allowing clients to send `If-None-Match` and receive a `304 Not Modified` when the config has not changed. This pattern is **already used** in the codebase for the `/config/` endpoint (`diracx-routers/src/diracx/routers/configuration.py`).

```python
from fastapi import Header, HTTPException, status

@router.get("/openid-configuration")
async def openid_configuration(
    ...,
    config: Config,
    if_none_match: str | None = Header(None),
):
    headers = {"ETag": config._hexsha}
    if if_none_match == config._hexsha:
        raise HTTPException(status_code=status.HTTP_304_NOT_MODIFIED, headers=headers)
    result = await get_openid_configuration_bl(...)
    return JSONResponse(content=result, headers=headers)
```

| Pros | Cons |
|------|------|
| Already a proven pattern in the DiracX codebase (`/config/` endpoint) | Still hits the server on every request (but avoids recomputing the body when combined with server-side cache) |
| Never serves stale data — invalidation is deterministic via hexsha | Requires clients to support `If-None-Match` (all modern HTTP clients do) |
| No arbitrary TTL — clients always get current data or a 304 | |

> This can be combined with Option D (`Cache-Control`) for layered caching: ETag for correctness, `max-age` for reducing request frequency.

### Recommendation

Options are not mutually exclusive. A combination of **B** (logic-layer cache) + **C** (hexsha-keyed, no TTL) + **E** (ETag headers) would give deterministic invalidation, partial extension compatibility (base computation is cached; extensions still recompute their own additions), and client-side caching without serving stale data. Option D (`Cache-Control`) can be added on top if reducing request frequency is also a goal, at the cost of briefly stale responses.

To fully address extension caching, a follow-up should provide a reusable `hexsha_cached` decorator or similar pattern that extensions like gubbins can apply to their own wrapper functions.

Any single option is already an improvement over the current state.

### Related Issues

- Draft PR: https://github.com/DIRACGrid/diracx/pull/457

### Additional Context

- The codebase already uses `cachetools.TTLCache` in two places: `factory.py` (DB ping, ttl=10s) and `auth/utils.py` (IAM server metadata, ttl=3600s), so the pattern is established.
- The `/config/` endpoint already implements ETag + `304 Not Modified` caching using `config._hexsha` (via `raise HTTPException(status_code=304, headers=headers)`). Starlette's default exception handler special-cases 304 and 204 to return a response with no body, so this pattern is HTTP-compliant. It also supports `If-Modified-Since` with `config._modified` as a secondary cache control mechanism.
- FastAPI caches the OpenAPI schema separately via `self.openapi_schema` in `DiracFastAPI.openapi()`. This is a distinct mechanism and does not affect the well-known endpoints.
- The gubbins extension overrides `/dirac-metadata` with its own route (`gubbins/routers/well_known.py`) and layers additional logic on top of the base `get_installation_metadata()`, so neither router-level nor logic-layer caching fully covers the extension without additional work in gubbins.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Cache static endpoint responses #835

User Story

Feature Description

Scope

Definition of Done

Alternatives Considered

Option A — TTLCache at the router level (current draft PR #457)

Option B — Cache in the logic layer

Option C — Hexsha-keyed cache without TTL

Option D — HTTP Cache-Control headers

Option E — ETag-based HTTP caching

Recommendation

Related Issues

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pros	Cons
Minimal change, consistent with existing `TTLCache` usage in the codebase (`factory.py`, `auth/utils.py`)	Cache lives in the router layer — extensions like gubbins that override endpoints must replicate the caching
Easy to understand	TTL-based expiry is arbitrary; cache may serve stale data briefly or expire unnecessarily

Pros	Cons
Extensions inherit caching of the base computation for free	Logic layer gains a caching concern
Single place to maintain	Does not fully solve extension caching (see caveat below)

Pros	Cons
No `cachetools` dependency needed for this use case	Concurrent requests during a config transition may duplicate computation (benign — no stale data). An `asyncio.Lock` could be used if even this is undesirable.
Cache is never stale — invalidation is deterministic	Requires using `global _config_cache` or wrapping in a class to allow dict replacement

Pros	Cons
Standard HTTP mechanism, works with CDNs/proxies	Does not reduce server-side compute on its own
Offloads caching to clients, zero server memory	Clients may see stale data for up to `max-age` seconds after a config change

Pros	Cons
Already a proven pattern in the DiracX codebase (`/config/` endpoint)	Still hits the server on every request (but avoids recomputing the body when combined with server-side cache)
Never serves stale data — invalidation is deterministic via hexsha	Requires clients to support `If-None-Match` (all modern HTTP clients do)
No arbitrary TTL — clients always get current data or a 304

[Feature]: Cache static endpoint responses #835

Description

User Story

Feature Description

Scope

Definition of Done

Alternatives Considered

Option A — TTLCache at the router level (current draft PR #457)

Option B — Cache in the logic layer

Option C — Hexsha-keyed cache without TTL

Option D — HTTP Cache-Control headers

Option E — ETag-based HTTP caching

Recommendation

Related Issues

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions