Skip to content

Commit 7e92154

Browse files
committed
feat: [pko-351] added documentation for HostedClusterPackage API
Signed-off-by: Ankit152 <[email protected]>
1 parent a015b2d commit 7e92154

File tree

2 files changed

+459
-70
lines changed

2 files changed

+459
-70
lines changed
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
---
2+
title: HostedClusterPackage API
3+
weight: 1200
4+
images: []
5+
mermaid: true
6+
---
7+
8+
The `HostedClusterPackage` API extends Package Operator with progressive
9+
rollout capabilities for Packages targeting HyperShift Hosted Control Planes
10+
(HCP). It introduces a cluster-scoped custom resource, the Package, which
11+
governs the lifecycle and update process of Packages across all hosted control
12+
planes within a HyperShift Management Cluster.
13+
14+
## Overview
15+
16+
This API allows for the gradual rollout of updates to HostedClusters,
17+
significantly reducing the "blast radius" of failed upgrades compared to
18+
simultaneous updates. It also simplifies the configuration required to deliver
19+
objects into an HCP namespace by reducing the dependency on multiple systems
20+
(Hive, ACM) down to a single API.
21+
22+
```mermaid
23+
flowchart LR
24+
subgraph Hive
25+
metrics-sss["metrics-forwarder<br><b>SelectorSyncSet</b>"]
26+
end
27+
subgraph HyperShift Management Cluster
28+
metrics-hsp["metrics-forwarder<br><b>HyperShift Package</b>"]
29+
subgraph Namespace: my-cluster-x
30+
ns-c1["<b>Package</b>"]
31+
end
32+
subgraph Namespace: my-cluster-y
33+
ns-c2["<b>Package</b>"]
34+
end
35+
end
36+
metrics-sss--->metrics-hsp
37+
metrics-hsp--->ns-c1
38+
metrics-hsp--->ns-c2
39+
```
40+
41+
### Key Features
42+
43+
* **Progressive Rollout**: Updates are rolled out gradually to HostedClusters
44+
rather than all at once.
45+
* **Lifecycle Automation**: Automatically creates Packages for new
46+
HostedClusters and deletes them when the cluster is removed.
47+
* **Status & Monitoring**: Provides metrics and status updates on the number
48+
of available, unavailable, or updated packages.
49+
* **Simplified Configuration**: Reduces the configuration surface from
50+
multiple objects (SelectorSyncSet, Policy, PlacementRule, etc.) to just the
51+
`HostedClusterPackage` API.
52+
53+
## HostedClusterPackage Resource
54+
55+
The `HostedClusterPackage` resource is the core configuration object for this API.
56+
It coordinates the rollout process and defines how updates traverse the fleet
57+
of HostedClusters.
58+
59+
### Targeting Clusters
60+
61+
The API includes an optional label selector to target specific HostedClusters
62+
within the Management cluster.
63+
64+
### Partitioning
65+
66+
To control the order of updates, you can attach an optional partition configuration
67+
to the Package API. This ensures that all items within a specific group are processed
68+
before the rollout moves to the next group.
69+
70+
* **Grouping**: The configuration uses labels on the HostedCluster object to
71+
assign groups (e.g., hypershift.openshift.io/risk-group).
72+
* **Ordering**: Groups can be ordered via a static list or by alphanumeric
73+
ascending order.
74+
* **Implicit Handling**: HostedClusters without the specified label or with
75+
unknown values are placed in an implicit "unknown" group and upgraded last.
76+
* **Dynamic Regrouping**: If a cluster's label changes to an earlier group
77+
during an upgrade, the process will jump back to handle that group before
78+
continuing.
79+
80+
### Progression Strategies
81+
82+
The API supports configurable progression strategies to control the speed and safety
83+
of the rollout.
84+
85+
### Rolling Upgrade
86+
87+
The `rollingUpgrade` strategy is designed to keep service disruptions to a minimum.
88+
89+
* `maxUnavailable`: Configures the maximum number of Package instances that
90+
can be updating or unavailable at the same time. If a Package is already
91+
unavailable before the upgrade starts, it counts towards this limit. These
92+
unavailable packages are prioritized for updates to prevent accumulating
93+
faulty versions.
94+
* `minReadySeconds`: Specifies the minimum time to wait before considering a
95+
package ready (e.g., 60 seconds).
96+
97+
## Status & Observability
98+
99+
The Package API exposes status information to help you track the progress of a
100+
rollout and the health of the fleet. This status is critical for understanding
101+
if an update is proceeding smoothly or if it has stalled due to errors.
102+
103+
### Rollout State
104+
105+
The status subresource provides high-level metrics regarding the rollout
106+
process. These fields allow you to quickly assess the distribution of package
107+
versions across your HostedClusters:
108+
109+
* **Updated Packages**: The number of HostedClusters that have successfully
110+
received the latest version of the Package.
111+
112+
* **Available Packages**: The number of HostedClusters where the Package is
113+
currently healthy and serving traffic.
114+
115+
* **Unavailable Packages**: The number of HostedClusters where the Package is
116+
currently degraded or updating. This count is used to enforce the
117+
maxUnavailable limit defined in the progression strategy.
118+
119+
### Progression Logic
120+
121+
The operator uses the status of individual Packages to determine if the rollout can
122+
proceed to the next target.
123+
124+
* **Success**: If a targeted HostedCluster successfully updates and becomes
125+
available, the operator proceeds to select the next cluster in the partition.
126+
127+
* **Failure**: If a Package update fails or becomes unavailable, the rollout
128+
pauses for that specific rollout path. This prevents the propagation of errors
129+
to the rest of the fleet.
130+
131+
## Monitoring
132+
133+
In addition to the resource status, the controller exports metrics that
134+
provide a fleet-wide overview of the rollout state. SREs can utilize these
135+
metrics to build dashboards that visualize the number of available versus
136+
unavailable packages over time.
137+
138+
## Configuration Example
139+
140+
The following YAML example demonstrates a Package configured with
141+
risk-based partitioning and a rolling upgrade strategy.
142+
143+
```yaml
144+
apiVersion: package-operator.run/v1alpha1
145+
kind: HostedClusterPackage
146+
metadata:
147+
name: example-hosted-cluster-package
148+
spec:
149+
# Partition Configuration
150+
partition:
151+
labelKey: hypershift.openshift.io/risk-group
152+
order:
153+
# Static ordering example:
154+
static:
155+
- early
156+
- normal
157+
- late
158+
# OR Alphanumeric ordering (mutually exclusive with static):
159+
# alphanumericAsc: {}
160+
161+
# Progression Strategy
162+
minReadySeconds: 60
163+
strategy:
164+
rollingUpgrade:
165+
maxUnavailable: 1 # Max packages to update concurrently
166+
```
167+
168+
**Note**: The `HostedClusterPackage` API is experimental and subject to
169+
change in future.

0 commit comments

Comments
 (0)