-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Open
Labels
Description
Add support for OD-CDS concurrent with SoTW CDS
Currently, SoTW CDS updates delete all OD-CDS delivered clusters. This cluster removal is unnecessary, causes extra load on management servers, and was the cause of tricky to debug cluster_not_found responses on SoTW updates.
There's been similar concerns & solutions for related SoTW CDS interactions:
- xDS-TP (xds-federation: adding support for OD-CDS over xDS-TP concurrent with CDS SotW #41117)
- DynamicForwardProxy (Dynamic forward proxy with sub_clusters_config get stuck with wait for warmup #35171 (comment))
Example
Consider the following setup with both static clusters & ODCDS:
# bootstrap file referencing static clusters
dynamic_resources:
cds_config:
path_config_source:
path: /envoy/configs/cds.yaml
resource_api_version: V3
lds_config:
path_config_source:
path: /envoy/configs/lds.yaml
resource_api_version: V3
# cds.yaml w static clusters A,B,C
resources:
- "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
name: A
...
# lds.yaml route with OnDemand CDS for clusters D,E,F
...
- match: ...
route:
weighted_clusters:
clusters:
- name: D
weight: 100
typed_per_filter_config:
envoy.filters.http.on_demand:
"@type": type.googleapis.com/envoy.extensions.filters.http.on_demand.v3.PerRouteConfig
odcds: ...
timeout: 45s- Envoy starts and loads clusters A,B,C from
cds.yaml - Over time, Envoy discovers clusters D,E,F via ODCDS
- SoTW CDS update occurs. This can happen by replacing the
lds.yamlfile. - SoTW CDS removes ODCDS clusters D,E,F
Proposal
I have a few independent high level ideas to make this work:
- Extend xds-federation: adding support for OD-CDS over xDS-TP concurrent with CDS SotW #41117, making SoTW CDS only able to remove its own clusters.
- Have ODCDS add Clusters similar to DFP, piping
avoid_cds_removalthrough to thecds_api_helper. - Add explicit owner tracking in the Cluster Manager. SoTW CDS, ODCDS, DFP, etc use a unique identifier when interacting with the Cluster Manager. The Cluster Manager rejects requests related to clusters with a mismatched owner.
I'm in favor of option 1 since it's a straightforward change and implements what the intended behavior of SoTW CDS should be.