-
Notifications
You must be signed in to change notification settings - Fork 231
RFC: Dynamic Disks Support in BOSH #1401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
edb5daa to
83fa9cd
Compare
Architectural Concerns: Runtime Dependencies, Security Model, and Adoption PathThis RFC proposes significant shifts in BOSH's operational and security model that need to be addressed before proceeding with API design. 1. BOSH Director in the Runtime Critical PathCurrent State: Proposed State: Blocker: There is no open-source high-availability solution for BOSH Director. The Director is typically deployed as a single instance and is not designed for the uptime guarantees that runtime-critical infrastructure requires. This RFC should be contingent on an HA Director solution existing first, unless the design is changed such that the Director is no longer in the critical path of workload operations. 2. Security Architecture ChangeCurrent State: Proposed State:
This is a substantial change to the security boundary. Workloads now become potential attack vectors against the Director. A compromised Kubernetes cluster could potentially:
The RFC's authorization model is underspecified - it mentions "authorized clients" but doesn't detail how credentials are scoped, rotated, or isolated per workload. 3. No OSS ConsumerThere is no open-source BOSH release identified as an adopter of this feature. For the community to take on the additional complexity this RFC introduces - both in the Director codebase and in operational requirements - there should be a concrete plan for at least one OSS BOSH release to adopt dynamic disks. Without this, the feature adds maintenance burden with no clear benefit to the community. |
|
Agree with Ruben's statements. |
|
@rkoster @metskem to address the concerns:
|
|
@mariash Thank you for your response. Could you provide some recent examples (links to PRs or commits) to support: Historically, Cloud Foundry and BOSH have accepted opt-int features that served the needs of specific community members or commercial use cases. |
|
@rkoster here is a recent example: cloudfoundry/routing-release#451 |
5202302 to
e26cf5f
Compare
|
I updated the proposal with the potential BOSH CSI implementation. That CSI should be straightforward to implement and would be beneficial for BOSH community. We would like to keep BOSH changes in the upstream if possible. |
|
I wanted to call out what I see as the positive security implications here. In practice, the alternative to this feature isn’t “no dynamic disks” — it’s implementing dynamic disks outside BOSH, which tends to push cloud permissions into the workload/tenant boundary. That means we move from “IaaS privileges live in the Director (which is already the trusted component for IaaS access)” to “each workload environment needs some form of IaaS permission,” whether that’s static keys, instance roles, or workload identity. Even with the more modern options, you’re still granting cloud-level capabilities inside a boundary that’s harder to secure and audit consistently. I’ve seen this play out in PKS-era Kubernetes (and it’s m OSS KUBO release) on BOSH using cloud CSI plugins: it worked, but every cluster needed cloud permissions to provision/attach volumes. That increased blast radius — compromising the cluster control plane could translate into cloud-level capabilities — and it increased operational burden because we had to ship/patch 5 CSIs (1 for each IAAS) in the tenant environment. This design centralizes privileged disk operations back into the Director and enables narrowly scoped UAA access for consumers (e.g., disk operations only), which reduces credential sprawl and reduces the number of places where cloud-level privileges exist. Compromising a cluster no longer immediately yields IaaS privileges. This isn’t risk-free: it expands the Director API surface and makes availability important for new disk operations, so it needs guardrails (tight scopes, per-deployment credentials, network restrictions, and strong auditing). But compared to the current workarounds, I think this is a net improvement in least privilege and operational security. More broadly, I view this as a foundational primitive: not valuable on its own, but an enabling capability that makes it significantly easier for anyone to build stateful workloads on top of BOSH without reinventing (and re-securing) a parallel disk-control plane in every deployment. |
|
@mariash Thank you for providing context on the routing-release example. However, I don't think routing-release#451 (cloudfoundry/routing-release#451) supports the precedent you're citing. That issue was a straightforward library choice (whether to use go-metric-registry vs Prometheus directly) — a minor implementation decision with no architectural implications, no new API surface, and no changes to operational or security models.
Separately, I'd like to raise a concern about the structure of this RFC. It appears to combine two distinct proposals:
|
|
@Alphasite I agree with your framing of this as a foundational primitive, and I share your concern about IaaS credentials sprawling into workload boundaries. Centralizing disk operations is a sound architectural goal.
|
This PR adds the RFC "Dynamic Disks Support in BOSH".
For easier viewing, you can see the full RFC as preview.