Skip to content

Commit 754e0d1

Browse files
authored
HYPERFLEET-461 - feat: Add documentation of Longing-running reserved GKE cluster for Prow CI/CD job execution
HYPERFLEET-461 - feat: Add documentation of Longing-running reserved GKE cluster for Prow CI/CD job execution
2 parents ff9bed0 + c298543 commit 754e0d1

File tree

1 file changed

+246
-0
lines changed

1 file changed

+246
-0
lines changed
Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
# Prow CI/CD Cluster Documentation
2+
3+
## Overview
4+
5+
This is the long-running reserved GKE cluster for Prow CI/CD job execution. This document shows you how to access it, get information about it, update it, and remove it if needed.
6+
7+
- **Cluster Name**: `hyperfleet-dev-prow`
8+
- **GCP Project**: `hcm-hyperfleet`
9+
- **Connect Command**: `gcloud container clusters get-credentials hyperfleet-dev-prow --zone us-central1-a --project hcm-hyperfleet`
10+
11+
---
12+
13+
## Usage Policy
14+
15+
**This cluster is dedicated to running Prow CI/CD jobs for the team.**
16+
17+
- **Read-only operations** (viewing cluster info, logs, etc.) can be performed by all team members
18+
- **Modifications** (updates, deletions, configuration changes) to the cluster or the `prow-hyperfleet` namespace should follow these best practices:
19+
1. Get **explicit approval** from team leaders
20+
2. Send a **team-wide broadcast via Slack** before taking action to ensure everyone is aware of potential impacts
21+
22+
---
23+
24+
## Prerequisites for Viewing Cluster
25+
26+
```bash
27+
# Install required tools
28+
gcloud components install kubectl gke-gcloud-auth-plugin
29+
```
30+
31+
## Prerequisites for Terraform Operations
32+
33+
**Only needed if you want to view Terraform state, update, or remove the cluster.**
34+
35+
```bash
36+
# Install Terraform
37+
brew install terraform # Terraform >= 1.5
38+
39+
# Clone the infrastructure repository
40+
git clone https://github.com/openshift-hyperfleet/hyperfleet-infra.git
41+
cd hyperfleet-infra
42+
```
43+
44+
---
45+
46+
## How to Access the Cluster
47+
48+
### 1. Authenticate with GCP
49+
50+
```bash
51+
gcloud auth login
52+
gcloud config set project hcm-hyperfleet
53+
```
54+
55+
### 2. Get Cluster Credentials
56+
57+
```bash
58+
gcloud container clusters get-credentials hyperfleet-dev-prow \
59+
--zone us-central1-a \
60+
--project hcm-hyperfleet
61+
```
62+
63+
### 3. Verify Access
64+
65+
```bash
66+
kubectl get namespaces
67+
kubectl get pods -n prow-hyperfleet
68+
```
69+
70+
---
71+
72+
## How to Get Cluster Information
73+
74+
### View Cluster Details
75+
76+
```bash
77+
# Cluster status and configuration
78+
gcloud container clusters describe hyperfleet-dev-prow \
79+
--zone us-central1-a \
80+
--project hcm-hyperfleet
81+
82+
# Node information
83+
kubectl get nodes -o wide
84+
85+
# Running workloads
86+
kubectl get all -n prow-hyperfleet
87+
```
88+
89+
### View Terraform State and Output of Pub/Sub Resource Information
90+
91+
**First, clone the repo if you haven't already** (see [Prerequisites for Terraform Operations](#prerequisites-for-terraform-operations)).
92+
93+
```bash
94+
cd hyperfleet-infra/terraform
95+
96+
# Initialize with Prow backend
97+
terraform init -backend-config=envs/gke/dev-prow.tfbackend
98+
99+
# View all managed resources
100+
terraform state list
101+
102+
# View outputs (includes Pub/Sub config, etc.)
103+
terraform output
104+
105+
# View Pub/Sub resources
106+
terraform output pubsub_config
107+
terraform output pubsub_resources
108+
```
109+
110+
---
111+
112+
## How to Update the Cluster
113+
114+
**⚠️ REMINDER**: Review the [Usage Policy](#usage-policy) before proceeding. Leader approval and team-wide Slack broadcast are recommended.
115+
116+
**First, clone the repo if you haven't already** (see [Prerequisites for Terraform Operations](#prerequisites-for-terraform-operations)).
117+
118+
### 1. Navigate to Terraform Directory
119+
120+
```bash
121+
cd hyperfleet-infra/terraform
122+
```
123+
124+
### 2. Initialize Terraform with Prow Backend
125+
126+
```bash
127+
terraform init -backend-config=envs/gke/dev-prow.tfbackend
128+
```
129+
130+
### 3. Edit Configuration
131+
132+
Edit `envs/gke/dev-prow.tfvars` with your changes:
133+
134+
```hcl
135+
# Common changes:
136+
node_count = 2 # Scale up/down
137+
machine_type = "e2-standard-8" # Change VM size
138+
use_spot_vms = false # Switch to regular VMs
139+
```
140+
141+
### 4. Preview and Apply Changes
142+
143+
```bash
144+
# Review what will change
145+
terraform plan -var-file=envs/gke/dev-prow.tfvars
146+
147+
# Coordinate with team before applying
148+
# Then apply changes
149+
terraform apply -var-file=envs/gke/dev-prow.tfvars
150+
```
151+
152+
### 5. Verify Changes
153+
154+
```bash
155+
kubectl get nodes
156+
kubectl get pods -n prow-hyperfleet
157+
```
158+
159+
---
160+
161+
## How to Remove the Cluster
162+
163+
**⚠️ WARNING**: This destroys the entire Prow cluster. Review the [Usage Policy](#usage-policy) before proceeding. Leader approval and team-wide Slack coordination are strongly recommended.
164+
165+
**First, clone the repo if you haven't already** (see [Prerequisites for Terraform Operations](#prerequisites-for-terraform-operations)).
166+
167+
### 1. Disable Deletion Protection
168+
169+
Edit `envs/gke/dev-prow.tfvars`:
170+
171+
```hcl
172+
enable_deletion_protection = false
173+
```
174+
175+
Apply the change:
176+
177+
```bash
178+
cd hyperfleet-infra/terraform
179+
terraform init -backend-config=envs/gke/dev-prow.tfbackend
180+
terraform apply -var-file=envs/gke/dev-prow.tfvars
181+
```
182+
183+
### 2. Destroy the Cluster
184+
185+
```bash
186+
terraform destroy -var-file=envs/gke/dev-prow.tfvars
187+
```
188+
189+
### 3. Recreate (if needed)
190+
191+
```bash
192+
# Re-enable deletion protection in dev-prow.tfvars
193+
enable_deletion_protection = true
194+
195+
# Create cluster
196+
terraform apply -var-file=envs/gke/dev-prow.tfvars
197+
```
198+
199+
---
200+
201+
## Key Configuration Files in hyperfleet-infra Repo
202+
203+
| File | Purpose |
204+
|------|---------|
205+
| `terraform/envs/gke/dev-prow.tfvars` | Cluster configuration (nodes, machine type, etc.) |
206+
| `terraform/envs/gke/dev-prow.tfbackend` | Remote state configuration |
207+
| `terraform/main.tf` | Main Terraform module |
208+
209+
---
210+
211+
## Troubleshooting
212+
213+
### Can't Connect to Cluster
214+
215+
```bash
216+
# Re-authenticate
217+
gcloud auth login
218+
gcloud container clusters get-credentials hyperfleet-dev-prow \
219+
--zone us-central1-a \
220+
--project hcm-hyperfleet
221+
```
222+
223+
### Terraform State Lock Issues
224+
225+
**Note**: Terraform automatically locks the state file when using the remote backend (GCS) to prevent concurrent modifications. This is already enabled and working.
226+
227+
If a Terraform operation is interrupted (crashed, network issue, etc.), the lock may remain stuck. To resolve:
228+
229+
```bash
230+
# First, confirm no one is currently running terraform operations
231+
# Then force-unlock using the lock ID from the error message
232+
terraform force-unlock <LOCK_ID>
233+
```
234+
235+
**⚠️ WARNING**: Only use `force-unlock` after confirming no one else is actively running Terraform operations, as this can cause state corruption if multiple people modify state simultaneously.
236+
237+
---
238+
239+
## Additional Documentation
240+
241+
- **Detailed infrastructure docs**: `terraform/README.md` (in the cloned repo)
242+
- **Shared VPC setup**: `terraform/shared/README.md` (in the cloned repo)
243+
244+
---
245+
246+
**Last Updated**: 2026-01-23

0 commit comments

Comments
 (0)