Skip to content

Conversation

@tzhou5
Copy link

@tzhou5 tzhou5 commented Jan 9, 2026

Summary by CodeRabbit

  • Documentation
    • Added comprehensive test documentation covering NodePool lifecycle scenarios, prerequisites, validation criteria, status phases, and verification steps.
  • Tests
    • Added end-to-end test cases and a sample test fixture for creating and validating a NodePool (including resource and status verification).

✏️ Tip: You can customize this high-level summary in your review settings.

@openshift-ci openshift-ci bot requested review from tirthct and yasun1 January 9, 2026 16:02
@openshift-ci
Copy link

openshift-ci bot commented Jan 9, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign alexvulaj for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Jan 9, 2026

Walkthrough

Adds a new end-to-end test specification at testcases/nodepool_lifecycle.md that defines test flows for NodePool lifecycle management: creating a NodePool via API, verifying creation/listing/filtering and status phase transitions (e.g., NotReady → Ready), checking adapter-reported conditions, and validating Kubernetes resources (pods, jobs, nodes) in the deployed environment. Also adds a JSON test fixture testcases/templates/create_nodepool_gcp.json representing a GCP NodePool resource used by the tests. The document includes prerequisites, sample payloads, expected HTTP responses, and explicit validation criteria.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding NodePool test cases, which matches the changeset of new test documentation and fixtures.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @testcases/nodepool_lifecycle.md:
- Around line 304-598: Summary: The test doc lacks a single, consistent error
response schema and repeats/inconsistently shows examples across scenarios
(E2E-FAIL-002 / Scenario 2.x). Fix: add a single "Standard Error Response
Schema" section near the top of the Nodepool API validation section that defines
the canonical structure (name it ErrorResponseSchema or similar) and include one
canonical JSON example; then update each Scenario (2.1–2.9) to reference
ErrorResponseSchema instead of embedding divergent examples, replace inline
example blocks with "See ErrorResponseSchema" and only override when a scenario
requires different fields, and add the missing expected error response example
for Scenario 2.9 that references the standard schema (e.g., include the 404
variant). Ensure scenario headers or IDs (E2E-FAIL-002, Scenario 2.1 .. 2.9) are
used to locate and update the relevant blocks.
🧹 Nitpick comments (3)
testcases/nodepool_lifecycle.md (3)

1-10: Add language specifiers to all fenced code blocks and surround tables with blank lines for Markdown compliance.

This document has 51+ markdown linting violations. The two main issues are:

  1. Missing code block language specifiers (MD040): All \``blocks need a language identifier (e.g.,json, bash, http`)
  2. Tables without surrounding blank lines (MD058): Tables should have blank lines before and after

For example, lines 71 and 88 should be:

-```
+```http

... code block content ...

+```

+
| Validation Points |

While these are style violations, they're important for maintainability and CI/CD compliance.

♻️ Systematic fixes needed

Apply the following fixes systematically across the entire document:

  1. Code blocks: Add language specifier after opening backticks

    • HTTP requests → http
    • JSON payloads → json
    • Bash scripts → bash
  2. Tables: Add blank line before | and after closing row

Example pattern:

-**Validation Points**:
-| # | Check | Expected Result |
+**Validation Points**:
+
+| # | Check | Expected Result |
 |---|-------|-----------------|
 | 1.1 | HTTP Status Code | 201 Created |
+

Key lines to fix:

  • Lines 71, 110, 122, 137, 166, 235, 249, 334, 370, 406, 431, 455, 480, 507, 531, 566, 645, 669, 731, 763, 789, 807, 871, 945, 975, 1004, 1021, 1046, 1099, 1116, 1141, 1159 (code blocks)
  • Lines 88, 115, 148, 186, 240, 267, 345, 382, 418, 443, 468, 493, 519, 543, 578, 657, 674, 736, 777, 794, 812, 876, 950, 980, 1009, 1033, 1051, 1071, 1104, 1128, 1146, 1171, 1193, 1213 (table spacing)

274-282: Verify expected duration estimates are realistic and document assumptions.

Duration estimates (10-15 minutes total, with 5-10 minutes each for adapter processing and node provisioning) are significant for an E2E test suite. Recommend:

  1. Document the test environment specs that affect these timelines (GCP region, cluster size, machine types)
  2. Add guidance on acceptable variance and timeout handling
  3. Note if these timings assume parallel or sequential adapter execution

1063-1068: Correct the bash script syntax on lines 1063-1068.

The curl command is incomplete and not properly formatted for testing:

-for i in {1..10}; do
-  curl -X GET /api/hyperfleet/v1/clusters/{cluster_id}/nodepools &
-done
+for i in {1..10}; do
+  curl -X GET http://localhost:8080/api/hyperfleet/v1/clusters/{cluster_id}/nodepools &
+done
+wait

Issues:

  • Missing protocol and host in the URL
  • Missing wait statement to ensure all background jobs complete
  • Replace {cluster_id} with the actual test variable
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e17786 and e3b8cae.

📒 Files selected for processing (1)
  • testcases/nodepool_lifecycle.md
🧰 Additional context used
🪛 LanguageTool
testcases/nodepool_lifecycle.md

[grammar] ~257-~257: Ensure spelling is correct
Context: ...s Applied | True | | 5.8 | All adapters Health | True | | 5.9 | No error conditions | ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
testcases/nodepool_lifecycle.md

71-71: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


88-88: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


110-110: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


115-115: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


122-122: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


137-137: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


148-148: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


166-166: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


186-186: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


235-235: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


240-240: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


249-249: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


267-267: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


334-334: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


345-345: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


370-370: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


382-382: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


406-406: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


418-418: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


431-431: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


443-443: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


455-455: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


468-468: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


480-480: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


493-493: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


507-507: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


519-519: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


531-531: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


543-543: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


566-566: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


578-578: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


645-645: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


657-657: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


669-669: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


674-674: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


731-731: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


736-736: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


763-763: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


777-777: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


789-789: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


794-794: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


807-807: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


812-812: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


871-871: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


876-876: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


945-945: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


950-950: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


975-975: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


980-980: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


1004-1004: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


1009-1009: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


1021-1021: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


1033-1033: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


1046-1046: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


1051-1051: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


1071-1071: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


1099-1099: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


1104-1104: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


1116-1116: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


1128-1128: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


1141-1141: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


1146-1146: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


1159-1159: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


1171-1171: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


1193-1193: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


1213-1213: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

🔇 Additional comments (9)
testcases/nodepool_lifecycle.md (9)

3-3: Consider updating the document status from "Draft" to reflect its current readiness.

The document is marked as "Draft", but it contains comprehensive, well-structured test cases with detailed validation steps and success criteria. Clarify whether:

  • This status is intentional pending additional test cases
  • The document is ready for implementation
  • There are known gaps or TODOs requiring completion

63-63: Resolve the ambiguity about the evolving request body specification.

The note states: "The detailed request body spec is still evolving and subject to change." This creates uncertainty about whether test cases use the correct/final structure. Recommend:

  1. Link to or embed the finalized API spec
  2. Add version/date info for when this test case was last aligned with the API contract
  3. Document expected breaking changes and how tests should adapt

121-130: Add validation for the label filtering test to confirm expected format.

The label filtering test (Scenario 2.4-2.5) uses URL encoding (labels=workload%3Dgpu), but the test data (lines 55-59) shows labels as a nested object. Clarify:

  1. Is the filtering API query parameter format finalized?
  2. Should the test also validate that invalid label selectors return appropriate errors?
  3. Add a test case for multiple label filters (e.g., ?labels=workload%3Dgpu,tier%3Dcompute)

257-257: Fix the grammatical issue on line 257.

The table cell appears to have a spacing or formatting issue. The LanguageTool grammar checker flagged this line. Review the raw Markdown to confirm the syntax is correct.


495-498: Clarify the expected behavior for unsupported fields (Scenario 2.6).

The validation points show two possible outcomes:

  • Option A: Return 400 Bad Request for unknown fields (strict validation)
  • Option B: Return 201 and ignore unknown fields (lenient validation)

This ambiguity must be resolved because it affects:

  1. Client expectations and error handling logic
  2. API backwards compatibility strategy
  3. Test assertion logic

Recommendation: Decide on the intended behavior and update the test case to match. If the design choice is still pending, create a separate decision record.


627-723: Clarify the setup process for Scenario A (Job Creation Failure) as it involves adapter configuration.

Lines 635-637 describe configuring the adapter with invalid YAML that references unknown Kubernetes custom resources. However:

  1. How is this configuration injected? Is it part of the test setup, a mock/stub, or a code change?
  2. Is this a one-time setup or per-test-case setup?
  3. What's the cleanup procedure? (e.g., Does line 907-908 restore the adapter configuration?)
  4. Can this be automated, or does it require manual intervention?

If the adapter configuration is not under test infrastructure control, this scenario may be difficult to automate in a CI/CD pipeline.


744-862: Verify the test setup for Scenario B (Job Execution Failure) is reproducible and idempotent.

Scenario B (lines 744-862) relies on providing an "invalid machine type" to trigger Job execution failure. However:

  1. Is the machine type validation performed by the adapter or the cloud provider (GCP)?
  2. Will using invalid-machine-type-xyz consistently trigger the same failure mode, or could validation timing differ?
  3. Is there a risk of this test becoming flaky if the error message format changes in GCP?

Recommendation: Consider using a more controlled failure mechanism (e.g., injection points, environment variables, or feature flags) that doesn't rely on specific GCP API behaviors.


1182-1218: Specify the expected recovery behavior for Scenario 6.1 (Intermittent Connection Failures).

The test steps for intermittent failures (lines 1182-1218) don't specify:

  1. How long the brief interruption should last? (milliseconds, seconds, minutes?)
  2. What constitutes "automatic recovery"? (e.g., Does the adapter retry immediately, or does it wait for a health check?)
  3. Are there any side effects (e.g., Does the nodepool status temporarily report as "Error"?)
  4. What's the expected timeline for reaching Ready state after recovery?

Recommendation: Add specific timing constraints and expected state transitions to make this scenario reproducible and automatable.


1271-1275: Verify the external document references are correct and accessible.

Lines 1271-1275 reference related documents:

  • ../hyperfleet-e2e-scenario/hyperfleet-api-e2e-scenario.md
  • ../hyperfleet-critical-user-journey/hyperfleet-api-cuj.md
  • ../hyperfleet-critical-user-journey/hyperfleet-adapter-cuj.md

Ensure these paths are correct relative to the testcases/ directory and that the referenced documents exist in the repository.

Comment on lines 304 to 598

### Nodepool API Request Body Validation Failures

| Field | Value |
|-------|-------|
| **Test Case ID** | E2E-FAIL-002 |
| **Title** | Nodepool API Request Body Validation Failures |
| **Priority** | P1 |
| **Type** | Failure Scenario |
| **Scope** | MVP |

#### Objective

Validate API properly validates nodepool creation request body and rejects invalid requests with clear error messages. Ensure validation happens at API layer before any adapter processing begins.

#### Prerequisites

| Prerequisite | Description |
|--------------|-------------|
| PRE-001 | Cluster created via E2E-001 and in Ready state |
| PRE-002 | Valid authentication credentials available |

#### Test Scenarios

---

##### Scenario 2.1: Missing Required Field - Name

**Action**: Submit nodepool creation without required `name` field

```
POST /api/hyperfleet/v1/clusters/{cluster_id}/nodepools
Content-Type: application/json

{
"machineType": "n1-standard-8",
"replicas": 2
}
```

**Validation Points**:
| # | Check | Expected Result |
|---|-------|-----------------|
| 2.1.1 | HTTP Status Code | 400 Bad Request |
| 2.1.2 | Error response contains field name | "name" mentioned |
| 2.1.3 | Error message is descriptive | "name is required" or similar |
| 2.1.4 | No resource created in database | GET nodepools doesn't show new entry |
| 2.1.5 | No adapter processing triggered | No events in message broker |

**Expected Error Response**:
```json
{
"kind": "Error",
"id": "400",
"href": "/api/hyperfleet/v1/errors/400",
"code": "HYPERFLEET-400",
"reason": "Validation failed: 'name' is a required field"
}
```

---

##### Scenario 2.2: Invalid Field Value - Negative Replicas

**Action**: Submit nodepool creation with negative replicas value

```
POST /api/hyperfleet/v1/clusters/{cluster_id}/nodepools
Content-Type: application/json

{
"name": "test-nodepool-invalid",
"machineType": "n1-standard-8",
"replicas": -1
}
```

**Validation Points**:
| # | Check | Expected Result |
|---|-------|-----------------|
| 2.2.1 | HTTP Status Code | 400 Bad Request |
| 2.2.2 | Error response contains field name | "replicas" mentioned |
| 2.2.3 | Error message indicates constraint | "must be positive" or "must be >= 0" |
| 2.2.4 | No resource created in database | GET nodepools doesn't show new entry |

**Expected Error Response**:
```json
{
"kind": "Error",
"id": "400",
"href": "/api/hyperfleet/v1/errors/400",
"code": "HYPERFLEET-400",
"reason": "Validation failed: 'replicas' must be a positive integer"
}
```

---

##### Scenario 2.3: Invalid Field Value - Empty Machine Type

**Action**: Submit nodepool creation with empty machineType

```
POST /api/hyperfleet/v1/clusters/{cluster_id}/nodepools
Content-Type: application/json

{
"name": "test-nodepool-invalid",
"machineType": "",
"replicas": 2
}
```

**Validation Points**:
| # | Check | Expected Result |
|---|-------|-----------------|
| 2.3.1 | HTTP Status Code | 400 Bad Request |
| 2.3.2 | Error response contains field name | "machineType" mentioned |
| 2.3.3 | Error message indicates constraint | "cannot be empty" or "is required" |
| 2.3.4 | No resource created in database | GET nodepools doesn't show new entry |

---

##### Scenario 2.4: Invalid Field Type - Wrong Data Type

**Action**: Submit nodepool creation with wrong data type for replicas

```
POST /api/hyperfleet/v1/clusters/{cluster_id}/nodepools
Content-Type: application/json

{
"name": "test-nodepool-invalid",
"machineType": "n1-standard-8",
"replicas": "two"
}
```

**Validation Points**:
| # | Check | Expected Result |
|---|-------|-----------------|
| 2.4.1 | HTTP Status Code | 400 Bad Request |
| 2.4.2 | Error indicates type mismatch | "integer expected" or similar |
| 2.4.3 | No resource created in database | GET nodepools doesn't show new entry |

---

##### Scenario 2.5: Invalid JSON Syntax

**Action**: Submit nodepool creation with malformed JSON

```
POST /api/hyperfleet/v1/clusters/{cluster_id}/nodepools
Content-Type: application/json

{
"name": "test-nodepool-invalid",
"machineType": "n1-standard-8"
"replicas": 2
}
```
(Note: Missing comma after machineType)

**Validation Points**:
| # | Check | Expected Result |
|---|-------|-----------------|
| 2.5.1 | HTTP Status Code | 400 Bad Request |
| 2.5.2 | Error indicates JSON parse error | "invalid JSON" or "parse error" |
| 2.5.3 | No resource created in database | GET nodepools doesn't show new entry |

---

##### Scenario 2.6: Unsupported Field

**Action**: Submit nodepool creation with unsupported field

```
POST /api/hyperfleet/v1/clusters/{cluster_id}/nodepools
Content-Type: application/json

{
"name": "test-nodepool-invalid",
"machineType": "n1-standard-8",
"replicas": 2,
"unknownField": "someValue"
}
```

**Validation Points**:
| # | Check | Expected Result |
|---|-------|-----------------|
| 2.6.1 | HTTP Status Code | 400 Bad Request (or 201 if ignoring unknown fields) |
| 2.6.2 | If rejected: Error mentions unknown field | "unknownField is not recognized" |
| 2.6.3 | If accepted: Unknown field is ignored | Response doesn't contain unknownField |

---

##### Scenario 2.7: Duplicate Nodepool Name

**Action**: Submit nodepool creation with already existing name

**Prerequisites**: Create a nodepool with name "existing-nodepool" first

```
POST /api/hyperfleet/v1/clusters/{cluster_id}/nodepools
Content-Type: application/json

{
"name": "existing-nodepool",
"machineType": "n1-standard-8",
"replicas": 2
}
```

**Validation Points**:
| # | Check | Expected Result |
|---|-------|-----------------|
| 2.7.1 | HTTP Status Code | 409 Conflict |
| 2.7.2 | Error indicates duplicate | "already exists" or "duplicate name" |
| 2.7.3 | Original nodepool unchanged | GET original nodepool shows no changes |

---

##### Scenario 2.8: Non-Existent Cluster ID

**Action**: Submit nodepool creation to non-existent cluster

```
POST /api/hyperfleet/v1/clusters/non-existent-cluster-id/nodepools
Content-Type: application/json

{
"name": "test-nodepool",
"machineType": "n1-standard-8",
"replicas": 2
}
```

**Validation Points**:
| # | Check | Expected Result |
|---|-------|-----------------|
| 2.8.1 | HTTP Status Code | 404 Not Found |
| 2.8.2 | Error indicates cluster not found | "cluster not found" or similar |
| 2.8.3 | No resource created | No nodepool created in any cluster |

**Expected Error Response**:
```json
{
"kind": "Error",
"id": "404",
"href": "/api/hyperfleet/v1/errors/404",
"code": "HYPERFLEET-404",
"reason": "Cluster 'non-existent-cluster-id' not found"
}
```

---

##### Scenario 2.9: Invalid Cluster ID Format

**Action**: Submit nodepool creation with malformed cluster ID

```
POST /api/hyperfleet/v1/clusters/invalid!@#$%cluster/nodepools
Content-Type: application/json

{
"name": "test-nodepool",
"machineType": "n1-standard-8",
"replicas": 2
}
```

**Validation Points**:
| # | Check | Expected Result |
|---|-------|-----------------|
| 2.9.1 | HTTP Status Code | 400 Bad Request or 404 Not Found |
| 2.9.2 | Error indicates invalid format | "invalid cluster ID format" |

---

#### Success Criteria

| Criteria | Description |
|----------|-------------|
| SC-001 | API returns HTTP 400 for validation errors |
| SC-002 | API returns HTTP 404 for non-existent cluster |
| SC-003 | API returns HTTP 409 for duplicate resources |
| SC-004 | Error messages are clear and indicate which field failed |
| SC-005 | Error messages indicate expected format/values |
| SC-006 | No resources created in database when validation fails |
| SC-007 | API doesn't crash or return 500 errors for validation failures |
| SC-008 | Validation happens before any adapter processing begins |
| SC-009 | No events published to message broker for failed validations |

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Document the validation error response schema more explicitly.

The E2E-FAIL-002 test case defines expected error response formats (e.g., lines 354-362, 549-558), but there's inconsistency:

  • Some scenarios show the full error schema (lines 354-362)
  • Others reference it implicitly without showing the response structure
  • Scenario 2.9 (lines 562-582) doesn't include an expected error response example

Recommendation: Create a standardized error response schema section early in the document and reference it consistently. This will improve clarity and reduce duplication.

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

334-334: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


345-345: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


370-370: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


382-382: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


406-406: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


418-418: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


431-431: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


443-443: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


455-455: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


468-468: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


480-480: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


493-493: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


507-507: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


519-519: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


531-531: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


543-543: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


566-566: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


578-578: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

🤖 Prompt for AI Agents
In @testcases/nodepool_lifecycle.md around lines 304 - 598, Summary: The test
doc lacks a single, consistent error response schema and repeats/inconsistently
shows examples across scenarios (E2E-FAIL-002 / Scenario 2.x). Fix: add a single
"Standard Error Response Schema" section near the top of the Nodepool API
validation section that defines the canonical structure (name it
ErrorResponseSchema or similar) and include one canonical JSON example; then
update each Scenario (2.1–2.9) to reference ErrorResponseSchema instead of
embedding divergent examples, replace inline example blocks with "See
ErrorResponseSchema" and only override when a scenario requires different
fields, and add the missing expected error response example for Scenario 2.9
that references the standard schema (e.g., include the 404 variant). Ensure
scenario headers or IDs (E2E-FAIL-002, Scenario 2.1 .. 2.9) are used to locate
and update the relevant blocks.

@tzhou5 tzhou5 changed the title HYPERFLEET-471 - test: add the nodepool testcase WIPHYPERFLEET-471 - test: add the nodepool testcase Jan 13, 2026
@tzhou5 tzhou5 changed the title WIPHYPERFLEET-471 - test: add the nodepool testcase WIP: HYPERFLEET-471 - test: add the nodepool testcase Jan 13, 2026
@tzhou5 tzhou5 force-pushed the nodepool-e2e-testcases branch from e3b8cae to 0aa7d4a Compare January 20, 2026 06:32
@tzhou5 tzhou5 changed the title WIP: HYPERFLEET-471 - test: add the nodepool testcase HYPERFLEET-471 - test: add the nodepool testcase Jan 20, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@testcases/nodepool_lifecycle.md`:
- Around line 124-226: The test step lacks timing/polling guidance for status
updates—update the NodePool lifecycle test (Step 2) to include explicit
wait/poll instructions: state a polling interval (e.g., poll status.conditions
every 5s), a total timeout (e.g., 2 minutes), retry behavior (continue polling
until status.conditions contains a condition with type
"ValidationAdapterSuccessful" and observed_generation equals the NodePool's
generation), and the clear proceed criteria (proceed when the condition exists
and status.observed_generation matches generation; fail after timeout).
Reference the Step 2 checks for status.conditions, ValidationAdapterSuccessful,
and observed_generation so testers/automation know exactly what to watch for.
- Around line 544-604: Update "Step 1: Check resources in deployed environment"
to explicitly label each verification as Required, Optional, or Conditional:
mark cluster namespace pod/job checks (Validation adapter Job exists, Jobs
status.succeeded:1, Validation pod Completed, no restarts) as Required; mark
hosted-cluster node checks (kubectl get nodes --kubeconfig...) as Conditional
with the condition "(if hosted cluster kubeconfig is available)"; mark logs
verification as Optional; and adjust wording for job/pod/node expectations
(e.g., "Required: Jobs must show status.succeeded: 1 — test fails if not",
"Conditional: Nodes must match spec.replicas when hosted cluster is accessible")
so each item clearly states pass/fail impact and the condition that applies.
- Around line 229-314: Update the Step 3 documentation to explicitly explain how
nodepool.status.phase is derived from adapter conditions and statuses: describe
that nodepool.status.conditions entries (e.g., ValidationAdapterSuccessful,
NodePoolAdapterSuccessful) reflect adapter-specific condition types in
nodepool.status.conditions while the adapters’ overall lifecycle states
(Applied, Available, Health) are exposed by the /statuses endpoint and must be
translated into the NodePool phase; state the rule used to set
nodepool.status.phase to "Ready" (e.g., all required adapter conditions present
with status "True" AND corresponding adapters report Available/Healthy via
/statuses) and list reasons phase can remain "NotReady" (missing conditions,
observed_generation mismatch, adapters not Available/Healthy). Also add a short
monitoring subsection with recommended polling guidance (poll interval, overall
timeout, max retries) and an example strategy for checking both
nodepool.status.conditions and the /statuses endpoint to decide readiness.
🧹 Nitpick comments (2)
testcases/nodepool_lifecycle.md (2)

407-411: Ensure test expectations align with "may vary" adapter count.

Line 409 notes that the adapter count "may vary as more adapters are deployed," but the test expectations appear hardcoded for exactly 2 adapters (validation-adapter and nodepool-adapter). If the adapter count is truly variable, the test validation logic should be more flexible (e.g., verify that expected adapters are present, rather than asserting exactly 2).


561-561: Minor grammar: Use hyphen in compound modifier.

The phrase "Adapter created resources" should be "Adapter-created resources" when used as a compound modifier before a noun.

Based on learnings, this is a minor documentation quality improvement.

Comment on lines +124 to +226
#### Step 2: Verify NodePool API response

**Action:**
Send GET request to retrieve the nodepool list and verify filtering capabilities:

1. List nodepools by name filter:
```bash
curl -G ${API_URL}/api/hyperfleet/v1/clusters/${CLUSTER_ID}/nodepools --data-urlencode "search=name='<nodepool_name>'"
```

2. List nodepools by label filter:
```bash
curl -G ${API_URL}/api/hyperfleet/v1/clusters/${CLUSTER_ID}/nodepools --data-urlencode "search=labels.workload='gpu'"
```

**Expected Result:**
- Response status code is 200 (OK)
- <details>
<summary>Response example (click to expand)</summary>

```json
{
"items": [
{
"created_by": "system",
"created_time": "2026-01-20T10:00:00.000000Z",
"generation": 1,
"href": "/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/abc123def456",
"id": "abc123def456",
"kind": "NodePool",
"labels": {
"workload": "gpu",
"tier": "compute",
"environment": "test"
},
"name": "hp-gcp-nodepool-1",
"spec": {
"clusterName": "hp-gcp-cluster-1",
"replicas": 2,
"platform": {
"type": "gcp",
"gcp": {
"instanceType": "n1-standard-8"
}
},
"release": {
"image": "registry.redhat.io/openshift4/ose-cluster-version-operator:v4.14.0"
}
},
"status": {
"conditions": [
{
"created_time": "2026-01-20T10:00:15Z",
"last_transition_time": "2026-01-20T10:00:15Z",
"last_updated_time": "2026-01-20T10:05:00Z",
"message": "NodePool validation passed",
"observed_generation": 1,
"reason": "ValidationPassed",
"status": "True",
"type": "ValidationAdapterSuccessful"
}
],
"last_transition_time": "2026-01-20T10:00:15Z",
"last_updated_time": "2026-01-20T10:05:00Z",
"observed_generation": 1,
"phase": "NotReady"
},
"updated_by": "system",
"updated_time": "2026-01-20T10:05:00Z"
}
],
"kind": "NodePoolList",
"page": 1,
"size": 1,
"total": 1
}
```
</details>
- **NodePoolList metadata:**
- `kind` is "NodePoolList"
- `total` matches the number of nodepools matching the filter
- `size` matches `total`
- `page` is 1
- **Created nodepool appears in the `items` array**
- **Label filtering works correctly:**
- Filtering by `labels.workload='gpu'` returns the created nodepool
- Filtering by non-matching labels returns empty result
- **System default fields:**
- `id` matches the ID from Step 1
- `href` is "/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}"
- `kind` is "NodePool"
- `created_by` is populated (currently "system" as placeholder; will change when auth is introduced)
- `created_time` is populated and not default value (not "0001-01-01T00:00:00Z")
- `updated_by` is populated (currently "system" as placeholder; will change when auth is introduced)
- `updated_time` is populated and not default value (not "0001-01-01T00:00:00Z")
- `generation` is 1
- `status.phase` is "NotReady" (initial state in MVP phase)
- `status.conditions` array exists with required fields: `type`, `status`, `reason`, `message`, `created_time`, `last_transition_time`, `last_updated_time`, `observed_generation`
- `status.observed_generation` matches nodepool generation
- `status.last_transition_time` and `status.last_updated_time` are populated with the real values
- **NodePool request body configured parameters:**
- All spec fields match the request payload

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Clarify timing expectations for status transitions.

Step 2 shows status.conditions populated with ValidationAdapterSuccessful and observed_generation updated to 1, but there's no guidance on how long to wait after Step 1 before executing Step 2, or how to poll/retry if the conditions haven't appeared yet. This could lead to race conditions in test automation.

Consider adding:

  • Expected wait time or polling interval
  • Retry logic guidance (e.g., "Poll every 5 seconds for up to 2 minutes")
  • Clear criteria for when to proceed to the next step
🤖 Prompt for AI Agents
In `@testcases/nodepool_lifecycle.md` around lines 124 - 226, The test step lacks
timing/polling guidance for status updates—update the NodePool lifecycle test
(Step 2) to include explicit wait/poll instructions: state a polling interval
(e.g., poll status.conditions every 5s), a total timeout (e.g., 2 minutes),
retry behavior (continue polling until status.conditions contains a condition
with type "ValidationAdapterSuccessful" and observed_generation equals the
NodePool's generation), and the clear proceed criteria (proceed when the
condition exists and status.observed_generation matches generation; fail after
timeout). Reference the Step 2 checks for status.conditions,
ValidationAdapterSuccessful, and observed_generation so testers/automation know
exactly what to watch for.

Comment on lines +229 to +314
#### Step 3: Retrieve the specific nodepool and monitor its status

**Action:**
Send GET request to retrieve the specific nodepool:

```bash
curl -X GET ${API_URL}/api/hyperfleet/v1/clusters/${CLUSTER_ID}/nodepools/{nodepool_id}
```

**Expected Result:**
- Response status code is 200 (OK)
- <details>
<summary>Response example (click to expand)</summary>

```json
{
"created_by": "system",
"created_time": "2026-01-20T10:00:00.000000Z",
"generation": 1,
"href": "/api/hyperfleet/v1/clusters/{cluster_id}/nodepools/abc123def456",
"id": "abc123def456",
"kind": "NodePool",
"labels": {
"workload": "gpu",
"tier": "compute",
"environment": "test"
},
"name": "hp-gcp-nodepool-1",
"spec": {
"clusterName": "hp-gcp-cluster-1",
"replicas": 2,
"platform": {
"type": "gcp",
"gcp": {
"instanceType": "n1-standard-8"
}
},
"release": {
"image": "registry.redhat.io/openshift4/ose-cluster-version-operator:v4.14.0"
}
},
"status": {
"conditions": [
{
"created_time": "2026-01-20T10:00:15Z",
"last_transition_time": "2026-01-20T10:00:15Z",
"last_updated_time": "2026-01-20T10:05:00Z",
"message": "NodePool validation passed",
"observed_generation": 1,
"reason": "ValidationPassed",
"status": "True",
"type": "ValidationAdapterSuccessful"
},
{
"created_time": "2026-01-20T10:00:15Z",
"last_transition_time": "2026-01-20T10:05:00Z",
"last_updated_time": "2026-01-20T10:10:00Z",
"message": "NodePool resources provisioned successfully",
"observed_generation": 1,
"reason": "ProvisioningComplete",
"status": "True",
"type": "NodePoolAdapterSuccessful"
}
],
"last_transition_time": "2026-01-20T10:05:00Z",
"last_updated_time": "2026-01-20T10:10:00Z",
"observed_generation": 1,
"phase": "NotReady"
},
"updated_by": "system",
"updated_time": "2026-01-20T10:10:00Z"
}
```
</details>
- Response contains all nodepool metadata fields from Step 1
- NodePool status contains adapter information:
- `status.conditions` array contains adapter status entries
- **ValidationAdapterSuccessful** condition exists (example):
- `type`: "ValidationAdapterSuccessful"
- `status`: "True"
- `reason`: "ValidationPassed"
- `message`: "NodePool validation passed"
- `created_time`, `last_transition_time`, `last_updated_time` populated and not default values
- `observed_generation`: 1 (must match nodepool.generation)
- `updated_time` is more recent than `created_time`, indicating the nodepool has been processed by adapters

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clarify the phase transition logic and monitoring approach.

This step shows both ValidationAdapterSuccessful and NodePoolAdapterSuccessful conditions with status: "True" (lines 271-291), yet phase remains "NotReady" (line 296). However, Step 5 describes logic where phase should be "Ready" when all adapters are available (lines 499-500).

The relationship between:

  1. Adapter conditions in nodepool.status.conditions (e.g., ValidationAdapterSuccessful)
  2. Adapter statuses from the /statuses endpoint (Applied, Available, Health)
  3. The nodepool.status.phase value

...is not clearly explained, making it difficult to understand when and why the phase transitions from NotReady to Ready.

Additionally, the step title mentions "monitor its status" but doesn't provide guidance on monitoring approach (polling interval, timeout, etc.).

📋 Suggested improvements

Add a subsection explaining:

  • What determines when phase transitions from NotReady to Ready
  • The relationship between adapter conditions in nodepool status vs adapter statuses
  • Recommended polling approach and timeout values
  • Why phase might remain NotReady even when adapter conditions show True
🤖 Prompt for AI Agents
In `@testcases/nodepool_lifecycle.md` around lines 229 - 314, Update the Step 3
documentation to explicitly explain how nodepool.status.phase is derived from
adapter conditions and statuses: describe that nodepool.status.conditions
entries (e.g., ValidationAdapterSuccessful, NodePoolAdapterSuccessful) reflect
adapter-specific condition types in nodepool.status.conditions while the
adapters’ overall lifecycle states (Applied, Available, Health) are exposed by
the /statuses endpoint and must be translated into the NodePool phase; state the
rule used to set nodepool.status.phase to "Ready" (e.g., all required adapter
conditions present with status "True" AND corresponding adapters report
Available/Healthy via /statuses) and list reasons phase can remain "NotReady"
(missing conditions, observed_generation mismatch, adapters not
Available/Healthy). Also add a short monitoring subsection with recommended
polling guidance (poll interval, overall timeout, max retries) and an example
strategy for checking both nodepool.status.conditions and the /statuses endpoint
to decide readiness.

Comment on lines +544 to +604
#### Step 1: Check resources in deployed environment

**Action:**
Verify Kubernetes resources created by the adapters:

1. List pods in the cluster namespace to verify nodepool-related pods:
```bash
kubectl get pods -n {cluster_id}
```

2. Check nodes in the hosted cluster (if accessible):
```bash
kubectl get nodes --kubeconfig=<hosted_cluster_kubeconfig>
```

**Expected Result:**

- **Adapter created resources:**
- **Validation adapter** creates a validation job under the cluster namespace
- **NodePool adapter** triggers node provisioning in the hosted cluster

- **Kubernetes Jobs verification:**
- Kubernetes Jobs created by adapters complete successfully
- Jobs should have `status.succeeded: 1`
- Example command to verify Jobs:
```bash
kubectl get jobs -n {cluster_id} | grep nodepool
```
- Example output:
```text
NAME COMPLETIONS DURATION AGE
nodepool-validator-abc123-1 1/1 30s 5m
```

- **Pods verification:**
- Validation pod (created by Job) exists in the cluster namespace
- Pod status should be "Completed" without errors
- Pod should not have any restarts or error states
- Example output:
```text
NAME READY STATUS RESTARTS AGE
nodepool-validator-abc123-1-xyz789 0/2 Completed 0 5m
```

- **Nodes verification (in hosted cluster):**
- Number of nodes matches `spec.replicas` (2 nodes)
- All nodes are in "Ready" status
- Nodes have correct labels matching nodepool labels
- Example output:
```text
NAME STATUS ROLES AGE VERSION
hp-gcp-nodepool-1-node-abc12 Ready worker 10m v1.27.0
hp-gcp-nodepool-1-node-def34 Ready worker 10m v1.27.0
```

- **Logs verification (optional manual check):**
- No errors in API service logs
- No errors in Sentinel operator logs
- No errors in Adapter logs
- Kubernetes Jobs completed without errors

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Distinguish mandatory verification steps from optional ones.

The test mixes mandatory and optional verification criteria without clear distinction:

  • Line 556: Checking hosted cluster nodes is marked "(if accessible)" - unclear if this is required for test success
  • Lines 567, 580-581: Describe what Jobs/Pods "should" have, but doesn't explicitly state these are test failure conditions
  • Lines 599-604: Logs verification explicitly marked "optional"

For reliable test automation, each verification should be clearly categorized as either:

  • Required: Test fails if criteria not met
  • Optional: Best practice check that doesn't affect pass/fail
  • Conditional: Required only if certain conditions are met
🧰 Tools
🪛 LanguageTool

[grammar] ~561-~561: Use a hyphen to join words.
Context: ...> ``` Expected Result: - Adapter created resources: - **Validation ad...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
In `@testcases/nodepool_lifecycle.md` around lines 544 - 604, Update "Step 1:
Check resources in deployed environment" to explicitly label each verification
as Required, Optional, or Conditional: mark cluster namespace pod/job checks
(Validation adapter Job exists, Jobs status.succeeded:1, Validation pod
Completed, no restarts) as Required; mark hosted-cluster node checks (kubectl
get nodes --kubeconfig...) as Conditional with the condition "(if hosted cluster
kubeconfig is available)"; mark logs verification as Optional; and adjust
wording for job/pod/node expectations (e.g., "Required: Jobs must show
status.succeeded: 1 — test fails if not", "Conditional: Nodes must match
spec.replicas when hosted cluster is accessible") so each item clearly states
pass/fail impact and the condition that applies.

@openshift-ci
Copy link

openshift-ci bot commented Jan 29, 2026

@tzhou5: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-images 0aa7d4a link true /test e2e-images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant