Skip to content

Commit 9b8bf1d

Browse files
committed
HIVE-3020: docs: add MachinePool adoption documentation improvements
1 parent 4a98528 commit 9b8bf1d

File tree

1 file changed

+308
-3
lines changed

1 file changed

+308
-3
lines changed

docs/using-hive.md

Lines changed: 308 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
- [Example Adoption ClusterDeployment](#example-adoption-clusterdeployment)
3636
- [Adopting with hiveutil](#adopting-with-hiveutil)
3737
- [Transferring ownership](#transferring-ownership)
38+
- [MachinePool Adoption](#machinepool-adoption)
3839
- [Configuration Management](#configuration-management)
3940
- [Vertical Scaling](#vertical-scaling)
4041
- [SyncSet](#syncset)
@@ -1212,7 +1213,7 @@ Hive will then:
12121213
12131214
It is possible to adopt cluster deployments into Hive.
12141215
This will allow you to manage the cluster as if it had been provisioned by Hive, including:
1215-
- [MachinePools](#machine-pools)
1216+
- [MachinePools](#machine-pools) - See [MachinePool Adoption](#machinepool-adoption) for how to adopt existing MachineSets when adopting a cluster
12161217
- [SyncSets and SelectorSyncSets](syncset.md)
12171218
- [Deprovisioning](#cluster-deprovisioning)
12181219
@@ -1253,10 +1254,46 @@ spec:
12531254
name: pull-secret
12541255
```
12551256

1257+
### Example Adoption ClusterDeployment for vSphere
1258+
```yaml
1259+
apiVersion: hive.openshift.io/v1
1260+
kind: ClusterDeployment
1261+
metadata:
1262+
name: my-vsphere-cluster
1263+
namespace: mynamespace
1264+
spec:
1265+
baseDomain: vsphere.example.com
1266+
clusterMetadata:
1267+
adminKubeconfigSecretRef:
1268+
name: my-vsphere-cluster-adopted-admin-kubeconfig
1269+
clusterID: f2e99580-389c-4ec5-b07f-4f489d6c0929
1270+
infraID: my-vsphere-cluster-khjpw
1271+
metadataJSONSecretRef:
1272+
name: my-vsphere-cluster-metadata-json
1273+
clusterName: my-vsphere-cluster
1274+
controlPlaneConfig:
1275+
servingCertificates: {}
1276+
installed: true
1277+
preserveOnDelete: true
1278+
platform:
1279+
vsphere:
1280+
certificatesSecretRef:
1281+
name: my-vsphere-cluster-adopted-vsphere-certificates
1282+
cluster: <cluster> # vSphere cluster name where VMs are deployed
1283+
credentialsSecretRef:
1284+
name: my-vsphere-cluster-adopted-vsphere-credentials
1285+
datacenter: <vSphere-datacenter-name>
1286+
defaultDatastore: <default-datastore-name>
1287+
network: <network name used by the cluster>
1288+
vCenter: <vCenter server domain name or IP address>
1289+
pullSecretRef:
1290+
name: my-vsphere-cluster-adopted-pull-secret
1291+
```
1292+
12561293
Note for `metadataJSONSecretRef`:
12571294
1. If the referenced Secret is available -- e.g. if the cluster was previously managed by hive -- simply copy it in.
1258-
1. If you have the original metadata.json file -- e.g. if the cluster was provisioned directly via openshift-install -- create the Secret from it: `oc create secret generic my-gcp-cluster-metadata-json -n mynamespace --from-file=metadata.json=/tmp/metadata.json`
1259-
1. Otherwise, you may need to compose the file by hand. See the samples below.
1295+
2. If you have the original metadata.json file -- e.g. if the cluster was provisioned directly via openshift-install -- create the Secret from it: `oc create secret generic my-gcp-cluster-metadata-json -n mynamespace --from-file=metadata.json=/tmp/metadata.json`
1296+
3. Otherwise, you may need to compose the file by hand. See the samples below.
12601297

12611298
If the cluster you are looking to adopt is on AWS and leverages Privatelink, you'll also need to include that setting under `spec.platform.aws` to ensure the VPC Endpoint Service for the cluster is tracked in the ClusterDeployment.
12621299

@@ -1360,6 +1397,274 @@ If you wish to transfer ownership of a cluster which is already managed by hive,
13601397
1. Edit the `ClusterDeployment`, setting `spec.preserveOnDelete` to `true`. This ensures that the next step will only release the hive resources without destroying the cluster in the cloud infrastructure.
13611398
1. Delete the `ClusterDeployment`
13621399
1. From the hive instance that will adopt the cluster, `oc apply` the `ClusterDeployment`, creds and certs manifests you saved in the first step.
1400+
1401+
### MachinePool Adoption
1402+
1403+
When adopting a cluster, you can also adopt existing MachineSets by creating MachinePools that match the existing MachineSets.
1404+
1405+
**Terminology:**
1406+
1407+
In this section, we use the following terms to avoid confusion:
1408+
1409+
- **MachinePool resource name** (`metadata.name`): The Kubernetes resource name of the MachinePool, e.g., `mycluster-worker`
1410+
- Must follow the pattern: `<clusterdeployment-name>-<pool-spec-name>`
1411+
- This restriction is enforced by webhook validation. If you attempt to create a MachinePool with a `metadata.name` that does not match this pattern, the webhook will reject it.
1412+
- Example: If your `ClusterDeployment` is named `mycluster` and `MachinePool.spec.name` is `worker`, then `MachinePool.metadata.name` must be exactly `mycluster-worker`
1413+
- **Pool spec name** (`spec.name`): The pool name defined in the MachinePool specification, e.g., `worker`
1414+
- Used in the `hive.openshift.io/machine-pool` label
1415+
- Used to generate MachineSet names
1416+
- You can choose any value for `spec.name` - it does NOT need to match existing MachineSet names. The only requirement is that it matches the `hive.openshift.io/machine-pool` label value when adopting existing MachineSets.
1417+
1418+
When adopting MachineSets, the `hive.openshift.io/machine-pool` label value must match the **pool spec name** (`spec.name`), not the MachinePool resource name.
1419+
1420+
**Environment:**
1421+
1422+
In this section, we distinguish between two cluster environments:
1423+
1424+
- **Hub cluster**: Where Hive is running and where MachinePool resources are created
1425+
- **Spoke cluster**: The managed cluster where MachineSets exist
1426+
1427+
All commands in this procedure will be clearly marked with either `# On hub cluster` or `# On spoke cluster` to indicate where each command should be executed.
1428+
1429+
Hive supports adopting existing MachineSets into MachinePool management in two scenarios:
1430+
1431+
#### Scenario 1: Adopt MachinePools When Adopting a Cluster
1432+
1433+
This scenario applies when you are adopting a cluster that was previously unmanaged by Hive. After adopting the cluster, you can bring the MachinePools along by labeling the existing MachineSets and creating corresponding MachinePools.
1434+
1435+
Steps:
1436+
1437+
1. Adopt the cluster (see [Cluster Adoption](#cluster-adoption) above)
1438+
2. Adopt the MachinePools using the [MachinePool Adoption Procedure](#machinepool-adoption-procedure) outlined below
1439+
- If there are additional MachineSets that should also be managed by Hive, create separate MachinePools for each distinct configuration
1440+
1441+
#### Scenario 2: Adopt Additional MachineSets for a Cluster Already Managed by Hive
1442+
1443+
If you want to adopt additional MachineSets for a cluster that is already managed by Hive, you can do so by creating MachinePools that match the existing MachineSets.
1444+
1445+
Steps:
1446+
1. Label the existing MachineSets with `hive.openshift.io/machine-pool=<machine-pool-name>`, where `<machine-pool-name>` is the value you will use for `MachinePool.spec.name` (the machine pool name, not the MachinePool resource name).
1447+
2. Create a corresponding MachinePool in the Hive hub cluster to manage these MachineSets
1448+
1449+
#### MachinePool Adoption Procedure
1450+
1451+
To adopt existing MachineSets:
1452+
1453+
1. Identify and inspect the existing MachineSets in the cluster that you want to manage:
1454+
```bash
1455+
# On spoke cluster - List all MachineSets
1456+
oc get machinesets -n openshift-machine-api
1457+
1458+
# On spoke cluster - Get detailed information about a specific MachineSet
1459+
oc get machineset <machineset-name> -n openshift-machine-api -o yaml
1460+
```
1461+
1462+
**Important**: Note the following details for each MachineSet you want to adopt:
1463+
- Instance type (e.g., `m5.xlarge` for AWS)
1464+
- Availability zone/failure domain (e.g., `us-east-1a`)
1465+
- Current replica count
1466+
- Any platform-specific configurations (root volume settings, etc.)
1467+
1468+
2. **Label the existing MachineSets** with the `hive.openshift.io/machine-pool` label. The label value must match the `spec.name` (machine pool name) you will use in the MachinePool:
1469+
```bash
1470+
# On spoke cluster - Label the MachineSet
1471+
oc label machineset <machineset-name> -n openshift-machine-api hive.openshift.io/machine-pool=<machine-pool-name>
1472+
```
1473+
1474+
**Note**: You must label each MachineSet you want to adopt. Each MachineSet in each availability zone needs the label.
1475+
1476+
3. **Create a MachinePool** with specifications that exactly match the existing MachineSets:
1477+
- The `spec.name` (machine pool name) must match the label value you applied in step 2
1478+
- The `spec.platform` configuration (instance type, zones, etc.) must exactly match the existing MachineSets. For platform-specific limitations, see [Platform-Specific Limitations](#platform-specific-limitations)
1479+
- The `spec.replicas` should match the current total replica count across all zones, or you can adjust it and Hive will reconcile
1480+
- The `spec.platform.<cloud>.zones` array must include all zones where MachineSets are labeled, and the order matters (see [Zone Configuration Warnings](#zone-configuration-warnings) below)
1481+
1482+
Example MachinePool for adopting existing worker MachineSets on AWS:
1483+
```yaml
1484+
apiVersion: hive.openshift.io/v1
1485+
kind: MachinePool
1486+
metadata:
1487+
name: mycluster-worker # MachinePool resource name
1488+
namespace: mynamespace
1489+
spec:
1490+
clusterDeploymentRef:
1491+
name: mycluster
1492+
name: worker # Machine pool name (spec.name) - must match the label value from step 2
1493+
platform:
1494+
aws:
1495+
type: m5.xlarge # Must exactly match existing MachineSet instance type
1496+
zones: # Must match all zones where MachineSets are labeled
1497+
- us-east-1a
1498+
- us-east-1b
1499+
- us-east-1c
1500+
replicas: 3 # Total replicas across all zones
1501+
```
1502+
Example MachinePool for adopting existing worker MachineSets on GCP:
1503+
```yaml
1504+
apiVersion: hive.openshift.io/v1
1505+
kind: MachinePool
1506+
metadata:
1507+
name: mihuanggcp-worker
1508+
spec:
1509+
clusterDeploymentRef:
1510+
name: mihuanggcp
1511+
name: worker
1512+
platform:
1513+
gcp:
1514+
osDisk:
1515+
diskSizeGB: 128
1516+
diskType: pd-ssd
1517+
type: n1-standard-4
1518+
zones:
1519+
- us-central1-a
1520+
- us-central1-c
1521+
- us-central1-f
1522+
replicas: 3
1523+
```
1524+
1525+
Example MachinePool for adopting existing worker MachineSets on vSphere:
1526+
```yaml
1527+
apiVersion: hive.openshift.io/v1
1528+
kind: MachinePool
1529+
metadata:
1530+
name: mihuang-1213a-worker
1531+
namespace: adopt
1532+
spec:
1533+
clusterDeploymentRef:
1534+
name: mihuang-1213a
1535+
name: worker
1536+
platform:
1537+
vsphere:
1538+
coresPerSocket: 4
1539+
cpus: 8
1540+
memoryMB: 16384
1541+
osDisk:
1542+
diskSizeGB: 120
1543+
replicas: 2
1544+
```
1545+
4. **Apply the MachinePool**:
1546+
```bash
1547+
# On hub cluster - Create the MachinePool
1548+
oc apply -f machinepool-adopt.yaml
1549+
```
1550+
1551+
5. **Verify the adoption**:
1552+
```bash
1553+
# On hub cluster - Check MachinePool status
1554+
oc get machinepool mycluster-worker -n mynamespace -o yaml
1555+
1556+
# On spoke cluster - Verify MachineSets were not recreated
1557+
oc get machinesets -n openshift-machine-api
1558+
```
1559+
1560+
#### Warning: Avoid Unintended Hive Management
1561+
1562+
Hive determines which MachineSets it manages based on two criteria:
1563+
1564+
1. **Name pattern match**: MachineSet name starts with `<cluster-name>-<pool-spec-name>-` (e.g., `mycluster-worker-us-east-1a-xxx`)
1565+
2. **Label match**: MachineSet has the `hive.openshift.io/machine-pool` label with a value matching the MachinePool's `spec.name`
1566+
1567+
If a MachineSet meets either of these criteria, Hive will consider it managed and may modify or delete it to match the MachinePool specification.
1568+
1569+
**Important**: If you manually create MachineSets with names matching the Hive naming pattern (e.g., `mycluster-worker-us-east-1a-xxx`) but do NOT want Hive to manage them, ensure:
1570+
- The MachineSet name does NOT start with `<cluster-name>-<pool-spec-name>-`
1571+
- The MachineSet does NOT have the `hive.openshift.io/machine-pool` label
1572+
1573+
If both the naming pattern and the label are present, Hive will assume this is a Hive-managed MachineSet and may modify or delete it to match the MachinePool specification. This can lead to unexpected MachineSet deletion or modification.
1574+
1575+
#### Zone Configuration Warnings
1576+
1577+
Zone configuration (failure domain configuration) is one of the most error-prone aspects of MachinePool adoption. Incorrect zone configuration can cause Hive to create new MachineSets and delete existing ones, leading to unexpected resource creation and potential service disruption.
1578+
1579+
1: Zone Mismatch Causes New MachineSet Creation
1580+
1581+
If the configured zones in `MachinePool.spec.platform.<cloud>.zones` do not match the existing MachineSets' failure domains (availability zones), Hive will:
1582+
- NOT adopt the existing MachineSets (even if they have the correct label)
1583+
- Create new MachineSets in the configured zones
1584+
- This can lead to unexpected resource creation and costs
1585+
1586+
Example of zone mismatch:
1587+
- Existing MachineSets: in zones `us-east-1a` and `us-east-1f` (with `hive.openshift.io/machine-pool=worker` label)
1588+
- MachinePool configured with zones: `us-east-1b` and `us-east-1c`
1589+
- Result:
1590+
- Existing MachineSets in `us-east-1a` and `us-east-1f` are not adopted (zone mismatch)
1591+
- If the existing MachineSets have the `hive.openshift.io/machine-pool` label, they will be deleted because they are considered controlled by the MachinePool but don't match the generated MachineSets
1592+
- New MachineSets are created in `us-east-1b` and `us-east-1c` to match MachinePool config
1593+
1594+
2: Zone Order Affects Replica Distribution
1595+
1596+
When using fixed replicas (not autoscaling), the order of zones (failure domains) in the array determines how replicas are distributed. You must ensure the zone order in `MachinePool.spec.platform.<cloud>.zones` matches the current replica distribution across zones, as incorrect zone order will cause Hive to redistribute replicas, leading to Machine creation or deletion.
1597+
1598+
Hive distributes replicas using this algorithm:
1599+
1600+
```go
1601+
replicas := int32(total / numOfAZs)
1602+
if int64(idx) < total % numOfAZs {
1603+
replicas++ // Earlier zones in the array get extra replicas
1604+
}
1605+
```
1606+
1607+
Example of zone order impact:
1608+
1609+
Current state (total: 3 replicas):
1610+
- `us-east-1f`: 2 replicas
1611+
- `us-east-1a`: 1 replica
1612+
1613+
Correct zone order (preserves current distribution):
1614+
```yaml
1615+
spec:
1616+
platform:
1617+
aws:
1618+
zones:
1619+
- us-east-1f # Index 0: gets 2 replicas
1620+
- us-east-1a # Index 1: gets 1 replica
1621+
replicas: 3
1622+
```
1623+
1624+
Incorrect zone order (causes Machine recreation):
1625+
```yaml
1626+
spec:
1627+
platform:
1628+
aws:
1629+
zones:
1630+
- us-east-1a # Index 0: will get 2 replicas
1631+
- us-east-1f # Index 1: will get 1 replica
1632+
replicas: 3
1633+
```
1634+
1635+
Result of incorrect order:
1636+
- Hive will scale `us-east-1a` from 1 to 2 replicas → 1 new Machine created
1637+
- Hive will scale `us-east-1f` from 2 to 1 replica → 1 Machine deleted
1638+
1639+
#### Platform-Specific Limitations
1640+
1641+
##### Nutanix and vSphere: Multiple Failure Domains
1642+
1643+
**Note:** vSphere zone support is coming soon but is not yet officially supported.
1644+
1645+
Nutanix and vSphere follow similar mechanisms for failure domain handling.
1646+
1647+
**For clusters configured with a single failure domain:**
1648+
1649+
- Nutanix and vSphere MachineSets can be adopted normally
1650+
- MachinePool adoption works correctly
1651+
1652+
**For clusters configured with multiple failure domains (e.g., FD1, FD2):**
1653+
1654+
After an OpenShift cluster is created, the failure domain configuration information is stored in the `Infrastructurespec.platformSpec.*.failureDomains`. The failure domains in the Infrastructure resource can be modified.
1655+
1656+
- If a newly added MachineSet in the spoke cluster is in FD1 or FD2, MachinePool adoption and autoscaling work normally.
1657+
1658+
**Limited Scenario:**
1659+
1660+
After creating a cluster with multiple failure domains (FD1, FD2) using Hive, if a new MachineSet is added in FD3 on the spoke cluster, it cannot be adopted.
1661+
The `ClusterDeployment.spec.platform.*.failureDomains` is immutable and does not support modification. Hive uses the ClusterDeployment's FailureDomains to generate MachineSets. Even if the `Infrastructurespec.platformSpec.*.failureDomains` resource has FD3, if the ClusterDeployment's FailureDomains does not have FD3:
1662+
1663+
- MachinePool adoption will fail because there is no generated FD3 MachineSet to match against
1664+
- The FD3 MachineSet will be deleted by Hive because it has the correct `hive.openshift.io/machine-pool` label (making `isControlledByMachinePool` return true) but no matching generated MachineSet exists
1665+
1666+
**Note:** There is one difference between Nutanix and vSphere: Nutanix can only configure one PrismCentral, while vSphere supports configuring multiple VCenters (topology). _After a vSphere cluster is created, adding new VCenters is not supported_; however, new failure domains can be added within existing VCenters.
1667+
13631668
## Configuration Management
13641669
13651670
### Vertical Scaling

0 commit comments

Comments
 (0)