Skip to content

Conversation

@jason-lynch
Copy link
Member

@jason-lynch jason-lynch commented Jan 8, 2026

Summary

Makes a few improvements to our image build, E2E fixtures, and make targets to cover common use cases and reduce the number of commands needed to run them.

Changes

  • Fixes a bug in our development image tagging
    • Did not affect releases because they're built from the release branch
  • Refactors our E2E vars files to reuse the variant configurations between Lima and EC2 and to make them easier to use
    • Before, you needed to specify the path to the vars file, but it needed to be relative to the e2e/fixtures directory. Now you only need the variant name, such as huge.
  • Adds some top-level make targets to cover the common use cases:
    • deploy-{lima,ec2}-fixture: Builds the Control Plane binaries, deploys and sets up the VMs, then deploys the Control Plane.
    • update-{lima,ec2}-fixture: Builds and deploys the Control Plane to an already-deployed fixture.
    • reset-{lima,ec2}-fixture: Builds the Control Plane binaries, tears down the existing Control Plane installation, and redeploys the Control Plane.
    • stop-{lima,ec2}-fixture: Stops the VMs
    • teardown-{lima,ec2}-fixture: Deletes the VMs

See the documentation diff for usage updates.

Testing

Follow the updated instructions in the e2e-tests.md doc.

Summary by CodeRabbit

  • Chores

    • Updated CI/build checksum handling; added automatic control-plane version resolution, increased default E2E parallelism to 8 and extended E2E timeout to 45m.
  • New Features

    • Unified fixture lifecycle: deploy/update/reset/stop/teardown for Lima and EC2.
    • New fixture controls: variant selection (small/large/huge + global), control-plane image override, and shared extra-vars.
    • New and revised fixture inventories and regional variant support.
  • Documentation

    • E2E docs rewritten to reflect new targets, variables, variant guidance, commands, and outputs.

✏️ Tip: You can customize this high-level summary in your review settings.

@mmols
Copy link
Member

mmols commented Jan 8, 2026

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Jan 8, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link

coderabbitai bot commented Jan 8, 2026

📝 Walkthrough

Walkthrough

Switch Makefile includes to common.mk; add fixture variant support and many fixture lifecycle targets (deploy/update/reset/stop/teardown) for Lima and EC2; move variant/provider vars into e2e fixtures, split/adjust fixture YAMLs, and move checksum reference to common.mk.

Changes

Cohort / File(s) Summary
CI/CD config
\.circleci/config.yml``
go.sum checksum computation now references common.mk instead of tools.mk (cache key source moved).
Top-level Makefile
\Makefile``
Swap include to common.mk; remove CONTROL_PLANE_VERSION; add FIXTURE_VARIANT, FIXTURE_CONTROL_PLANE_IMAGE, FIXTURE_EXTRA_VARS, CLUSTER_TEST_* vars; set E2E_PARALLEL default to 8; extend E2E timeout to 45m; add fixture lifecycle targets and internal helpers for lima/ec2 (deploy/update/reset/stop/teardown).
API Makefile
\api/Makefile``
Include changed from ../tools.mk to ../common.mk (no functional target changes).
Shared make vars
\common.mk``
Add CHANGIE_LATEST and deferred/defaulting logic for CONTROL_PLANE_VERSION (compute via git describe if unset).
E2E fixtures Makefile
\e2e/fixtures/Makefile``
Include ../../common.mk; add VARIANT (default large) and variant_vars_file; change defaults from *_rocky.yaml to generic *.yaml; inject --extra-vars='@$(variant_vars_file)' into Ansible invocations.
Fixture vars — provider baselines
\e2e/fixtures/vars/ec2.yaml`, `e2e/fixtures/vars/lima.yaml`, `e2e/fixtures/vars/lima_rocky_global.yaml``
Add ec2.yaml baseline; remove Lima Rocky global fixture and remove machines block from lima.yaml (data reorganization).
Fixture vars — variants
\e2e/fixtures/vars/small.yaml`, `.../large.yaml`, `.../huge.yaml``
Remove top-level provider/minio/architecture fields; adjust swarm_manager flags for specific hosts per variant.
Fixture vars — global variants
\e2e/fixtures/vars/small_global.yaml`, `.../large_global.yaml`, `.../huge_global.yaml``
Add region assignments to hosts, adjust swarm_manager flags, and remove s3_repository blocks where present; add small_global.yaml.
Docs
\docs/development/e2e-tests.md``
Replace per-provider deploy/setup/reset commands with unified fixture targets; document FIXTURE_VARIANT, FIXTURE_CONTROL_PLANE_IMAGE, FIXTURE_EXTRA_VARS, variant switching, teardown guidance, and updated sample commands/outputs.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant TopMake as "Top-level Makefile"
    participant E2EMake as "e2e/fixtures/Makefile"
    participant Vars as "e2e/fixtures/vars/*"
    participant Ansible as "Ansible / Provisioner"
    User->>TopMake: make deploy-lima-fixture FIXTURE_VARIANT=large
    TopMake->>E2EMake: invoke _deploy-lima-fixture wrapper
    E2EMake->>Vars: read vars/large.yaml and vars/lima.yaml (+ FIXTURE_EXTRA_VARS)
    Vars-->>E2EMake: merged extra-vars (variant + provider + extras)
    E2EMake->>Ansible: run playbook with merged extra-vars
    Ansible->>Ansible: provision hosts, deploy control plane
    Ansible-->>E2EMake: return playbook results + test-config
    E2EMake-->>TopMake: return status / path to test-config
    TopMake-->>User: fixture ready / test-config path
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through Make and vars all night,

Swapped tools for common, set variants right.
Deploy, update, reset — a hop, skip, leap,
Lima and EC2 now dance in one neat sweep.
Carrots for CI, and a warm fixture bite.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description follows the required template with a clear Summary, detailed Changes section explaining the improvements, and Testing instructions. However, the Checklist section is missing entirely. Add the missing Checklist section from the template, including items for tests, documentation, issue linking, changelog entries, and breaking changes verification.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main changes in the PR, which focus on improvements to E2E fixture configuration, variables, and Make targets for common testing workflows.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


🧹 Recent nitpick comments
docs/development/e2e-tests.md (5)

211-215: Consider mentioning the default variant for clarity.

The documentation states "six Rocky 9 VMs" here, and later (line 403) clarifies that the default variant is large. For consistency and to avoid confusion, consider mentioning the variant name in this initial description, e.g., "six Rocky 9 VMs (the large variant)".


279-283: Clarify behavior when re-running deploy on stopped VMs.

The instruction states "You can start them again by re-running the deploy-lima-fixture target." It would be helpful to clarify whether this target is idempotent (i.e., it detects the VMs exist and just starts them) or if there are any side effects users should be aware of when running deploy on stopped infrastructure.


312-314: Consider mentioning the default variant for consistency.

Similar to the Lima section, consider mentioning that six VMs corresponds to the large variant (e.g., "six Rocky 9 VMs with x86_64 architecture (the large variant)") for consistency with the fixture variants section.


379-388: Clarify behavior when re-running deploy on stopped EC2 instances.

Similar to the Lima section, it would be helpful to clarify whether deploy-ec2-fixture detects stopped instances and simply starts them, or if there are other behaviors users should be aware of. This helps users understand if the target is safe to re-run on existing infrastructure.


422-423: Consider explaining why teardown is required before switching variants.

The note correctly instructs users to teardown before switching variants, but it would be more helpful to briefly explain the reason (e.g., "to avoid resource conflicts" or "because the infrastructure needs to be recreated"). This helps users understand the importance and what might happen if they skip this step.

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 084ebf0 and 3f28466.

📒 Files selected for processing (15)
  • .circleci/config.yml
  • Makefile
  • api/Makefile
  • common.mk
  • docs/development/e2e-tests.md
  • e2e/fixtures/Makefile
  • e2e/fixtures/vars/ec2.yaml
  • e2e/fixtures/vars/huge.yaml
  • e2e/fixtures/vars/huge_global.yaml
  • e2e/fixtures/vars/large.yaml
  • e2e/fixtures/vars/large_global.yaml
  • e2e/fixtures/vars/lima.yaml
  • e2e/fixtures/vars/lima_rocky_global.yaml
  • e2e/fixtures/vars/small.yaml
  • e2e/fixtures/vars/small_global.yaml
💤 Files with no reviewable changes (3)
  • e2e/fixtures/vars/small.yaml
  • e2e/fixtures/vars/lima.yaml
  • e2e/fixtures/vars/lima_rocky_global.yaml
✅ Files skipped from review due to trivial changes (1)
  • e2e/fixtures/vars/small_global.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • e2e/fixtures/vars/ec2.yaml
  • e2e/fixtures/Makefile
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-08T16:29:19.608Z
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 238
File: e2e/fixtures/vars/huge_global.yaml:5-38
Timestamp: 2026-01-08T16:29:19.608Z
Learning: In e2e/fixtures/vars/*.yaml, treat the 'region' field as a spec for simulated latency in tests, not as the real deployment region. Ensure test latency computation uses this value and that reviewers understand it does not affect infrastructure placement. Consider adding a brief note in fixture docs clarifying this distinction.

Applied to files:

  • e2e/fixtures/vars/huge.yaml
  • e2e/fixtures/vars/large_global.yaml
  • e2e/fixtures/vars/huge_global.yaml
  • e2e/fixtures/vars/large.yaml
📚 Learning: 2026-01-08T16:29:27.297Z
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 238
File: e2e/fixtures/vars/huge_global.yaml:5-38
Timestamp: 2026-01-08T16:29:27.297Z
Learning: In the E2E fixture configuration files (e2e/fixtures/vars/*.yaml), the `region` field specified for each host machine is used to compute simulated latency for testing purposes, not to determine the actual deployment region for the infrastructure.

Applied to files:

  • docs/development/e2e-tests.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (15)
docs/development/e2e-tests.md (3)

390-399: Clear documentation for testing published releases.

The introduction of FIXTURE_CONTROL_PLANE_IMAGE is well-documented with a practical example for pre-release testing scenarios.


425-439: Well-documented variant options.

The available variants are clearly described with specific host counts, ETCD modes, and helpful context about the global variants' latency simulation mimicking real AWS regions.


410-420: Comprehensive variant usage example.

The example effectively demonstrates the complete workflow for using a different variant, including the important reminder to specify the variant with all related targets (update, reset, teardown).

e2e/fixtures/vars/large.yaml (1)

1-14: LGTM!

The fixture maintains 3 swarm managers (host-1, host-3, host-5), which is a valid odd number for Docker Swarm quorum. The swapping of manager roles between host-2→host-5 preserves the overall topology balance.

e2e/fixtures/vars/huge.yaml (1)

1-26: LGTM!

The fixture maintains 5 swarm managers (host-1, host-2, host-5, host-6, host-9), which is a valid odd number for Docker Swarm quorum. The redistribution of manager roles across 12 hosts appears intentional for test coverage.

e2e/fixtures/vars/huge_global.yaml (1)

1-38: LGTM!

The fixture distributes 5 swarm managers across 3 simulated regions (us-west-1, af-south-1, ap-southeast-4), providing good coverage for latency simulation tests. Based on learnings, the region field here specifies simulated latency for tests rather than actual infrastructure placement.

e2e/fixtures/vars/large_global.yaml (1)

1-20: LGTM!

The fixture distributes 3 swarm managers evenly across 3 simulated regions (one per region), which is ideal for testing multi-region latency scenarios. The swarm_manager assignments align with the corresponding large.yaml variant file. Based on learnings, the region field specifies simulated latency for tests.

Makefile (5)

9-9: LGTM!

Setting a default parallelism of 8 for E2E tests is a reasonable default that can still be overridden when needed.


14-22: LGTM!

The new fixture and cluster test configuration variables provide good flexibility for customizing fixture deployments. The naming is clear and follows the existing convention.


33-33: LGTM!

Increasing the E2E timeout from 20m to 45m accommodates the larger fixture variants (huge with 12 hosts) that were introduced.


154-229: Well-structured fixture orchestration targets.

The implementation effectively uses pattern rules to reduce duplication while providing clear public interfaces (deploy-lima-fixture, deploy-ec2-fixture, etc.). The conditional goreleaser-build dependency (Line 161) correctly skips building when an external image is provided. The EXTRA_VARS quoting with single quotes handles potential spaces correctly.


1-1: Include change is correct. The common.mk file properly exports all required variables (CONTROL_PLANE_VERSION, gotestsum, golangcilint, go-licenses, changie, goreleaser) that are referenced in the Makefile. The migration from tools.mk to common.mk is complete with no missing definitions.

.circleci/config.yml (1)

17-17: LGTM!

The checksum source update from tools.mk to common.mk is consistent with the repository-wide refactoring. Since common.mk now holds tool version definitions, this ensures the cache is properly invalidated when tool versions change.

api/Makefile (1)

1-1: LGTM!

The include path update aligns with the centralization to common.mk. All required variables (goa, yamlfmt) remain available.

common.mk (1)

12-16: The deferred expansion pattern is correctly implemented.

A few observations on the implementation:

  1. Network dependency: git fetch --quiet && requires network access. If fetch fails, git describe won't run due to &&, blocking version resolution. This affects the build, docker image tagging, goreleaser releases, and e2e tests—all of which consume CONTROL_PLANE_VERSION.

  2. Empty CHANGIE_LATEST fallback: If changie isn't installed or fails, CHANGIE_LATEST will be empty, causing git describe to describe HEAD rather than a specific version tag. However, changie is explicitly provided by the install-tools target (github.com/miniscruff/[email protected]), so this is mitigated in expected build environments.

These tradeoffs are acceptable for development workflows where tooling is expected to be installed and network access is available. The dynamic versioning approach aligns with the project's use of changie for release management.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @docs/development/e2e-tests.md:
- Around line 422-424: The docs contain a typo: change the referenced Make/Task
target name from "teardown-{lima,ec2}-target" to the correct
"teardown-{lima,ec2}-fixture" in the sentence that warns to run teardown before
switching variants; update the string literal in that line so it matches the
actual target names used elsewhere (i.e., replace "teardown-{lima,ec2}-target"
with "teardown-{lima,ec2}-fixture").
- Around line 410-420: The listed shell commands are missing the required make
prefix; update the examples so the targets are invoked via make by changing
deploy-lima-fixture, update-lima-fixture, reset-lima-fixture, and
teardown-lima-fixture to make deploy-lima-fixture, make update-lima-fixture,
make reset-lima-fixture, and make teardown-lima-fixture respectively so the
documented commands match the project's Makefile targets.

In @e2e/fixtures/vars/large_global.yaml:
- Around line 6-7: Host-2 is missing a region field; update the host entry for
"host-2" (the block containing swarm_manager: false) to include a region key
matching the other hosts (e.g., set region: us-west-1) so the host definitions
are consistent and avoid deployment issues.
🧹 Nitpick comments (2)
Makefile (1)

172-176: Minor: Trailing whitespace on line 174.

There appears to be trailing whitespace after _deploy-lima-fixture:. This is cosmetic but could be cleaned up.

♻️ Optional cleanup
 .PHONY: deploy-lima-fixture
-deploy-lima-fixture: _deploy-lima-fixture
-	
+deploy-lima-fixture: _deploy-lima-fixture
+
e2e/fixtures/Makefile (1)

15-18: Consider validating variant file existence.

If an invalid VARIANT is specified, Ansible will fail with a potentially confusing error about a missing vars file. You might consider adding validation or documenting valid variants more prominently.

This is optional since Ansible's error message will indicate the missing file path, which should be sufficient for debugging.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 494c771 and 3aff251.

📒 Files selected for processing (15)
  • .circleci/config.yml
  • Makefile
  • api/Makefile
  • common.mk
  • docs/development/e2e-tests.md
  • e2e/fixtures/Makefile
  • e2e/fixtures/vars/ec2.yaml
  • e2e/fixtures/vars/huge.yaml
  • e2e/fixtures/vars/huge_global.yaml
  • e2e/fixtures/vars/large.yaml
  • e2e/fixtures/vars/large_global.yaml
  • e2e/fixtures/vars/lima.yaml
  • e2e/fixtures/vars/lima_rocky_global.yaml
  • e2e/fixtures/vars/small.yaml
  • e2e/fixtures/vars/small_global.yaml
💤 Files with no reviewable changes (3)
  • e2e/fixtures/vars/small.yaml
  • e2e/fixtures/vars/lima_rocky_global.yaml
  • e2e/fixtures/vars/lima.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (22)
e2e/fixtures/vars/huge.yaml (2)

8-20: Swarm manager redistribution maintains correct quorum.

The swarm_manager flag changes redistribute the 5 manager nodes from hosts 1,2,3,4,5 to hosts 1,2,5,6,9. This maintains the odd number (5) required for Raft consensus quorum in Docker Swarm.


1-26: No action required. The refactoring correctly relocates provider-specific metadata to dedicated files: ec2.yaml contains minio_enabled, provider, architecture, and instance_os, while lima.yaml contains minio_enabled, provider, architecture, and provider-specific s3 configuration. The variant file (huge.yaml) appropriately contains only the machines topology.

e2e/fixtures/vars/small_global.yaml (1)

1-11: LGTM! Valid minimal multi-region cluster configuration.

The configuration defines a minimal viable cluster with 3 manager nodes distributed across 3 different AWS regions (us-west-1, af-south-1, ap-southeast-4). This provides both quorum (odd number) and geographic distribution for global testing scenarios.

e2e/fixtures/vars/large.yaml (1)

6-12: Swarm manager redistribution maintains valid quorum.

The changes redistribute the 3 manager nodes from hosts 1,2,3 to hosts 1,3,5. This maintains the required odd number for quorum and provides more even spacing across the host range.

e2e/fixtures/vars/huge_global.yaml (2)

10-29: Review swarm manager distribution across regions.

The 5 swarm managers are distributed unevenly across regions:

  • us-west-1: 2 managers (hosts 1, 2)
  • af-south-1: 2 managers (hosts 5, 6)
  • ap-southeast-4: 1 manager (host 9)

For optimal fault tolerance in a multi-region setup, consider a more balanced distribution (e.g., 2-2-1 could leave ap-southeast-4 vulnerable if both other regions fail). However, if this distribution is intentional based on expected load or network topology, this is acceptable.


1-38: The review comment is based on an incorrect premise. The s3_repository block was never present in huge_global.yaml. Git history shows this file has only one commit (when it was added), and the current content contains only the machines configuration. The s3_repository configuration exists in the separate lima.yaml fixture file (for the Lima provider), not in huge_global.yaml.

Likely an incorrect or invalid review comment.

e2e/fixtures/vars/large_global.yaml (2)

3-19: Well-balanced manager distribution across regions.

The 3 swarm managers are distributed evenly with one manager per region (us-west-1, af-south-1, ap-southeast-4). This provides optimal fault tolerance for a multi-region deployment, as the cluster can tolerate the loss of any single region.


1-19: Verify removed configurations are provided elsewhere.

Similar to other files in this refactor, top-level keys (minio_enabled, provider, architecture, template) and the s3_repository block were removed. Ensure these configurations are now provided via provider-specific files, environment variables, or FIXTURE_EXTRA_VARS to maintain functionality.

api/Makefile (1)

1-1: LGTM!

The include path update aligns with the migration to common.mk across the repository.

.circleci/config.yml (1)

17-17: LGTM!

Updating the checksum to include common.mk ensures the cache is invalidated when tool versions change. This is consistent with the migration from tools.mk.

Makefile (6)

1-1: LGTM!

Consistent with the common.mk migration.


9-9: LGTM!

Increasing default parallelism to 8 should improve test execution time.


14-16: LGTM!

Clear variable definitions with sensible defaults. The large default variant aligns with the Docker Compose setup.


33-40: LGTM!

The 45-minute timeout accommodates larger fixture variants and higher parallelism.


158-170: Clean pattern-based approach for fixture orchestration.

The logic correctly:

  1. Prepends external_control_plane_image when FIXTURE_CONTROL_PLANE_IMAGE is set
  2. Skips goreleaser-build when an external image is specified

The % pattern target elegantly handles both lima and ec2 variants with a single definition.


178-230: LGTM!

The fixture lifecycle targets (update, reset, stop, teardown) follow a consistent pattern. The separation between internal pattern targets and public wrappers provides flexibility while maintaining a clean public API.

e2e/fixtures/vars/ec2.yaml (1)

1-5: LGTM!

Clean separation of provider-specific configuration (EC2, x86_64, Rocky 9) from variant-specific configuration (machine counts). This enables reusing variant definitions across Lima and EC2.

e2e/fixtures/Makefile (2)

1-9: LGTM!

The VARIANT mechanism cleanly separates provider configuration (lima.yaml, ec2.yaml) from variant configuration (small.yaml, large.yaml, huge.yaml). The --extra-vars ordering ensures proper precedence: provider defaults → variant settings → explicit overrides.


55-58: LGTM!

Consistent with the Lima targets - the EC2 playbook invocation follows the same pattern for variant vars inclusion.

docs/development/e2e-tests.md (2)

211-226: LGTM!

Clear documentation of the fixture lifecycle targets with helpful context about VM creation.


427-439: LGTM!

Good documentation of available variants and the global variants with simulated latency. This provides clear guidance for test configuration selection.

common.mk (1)

12-16: Deferred expansion pattern is correctly implemented.

The idiom ensures CONTROL_PLANE_VERSION is computed lazily on first access and cached thereafter. The ?= allows external override, and the $(eval ...) defers evaluation until the variable is first referenced.

Note on network dependency: The git fetch --quiet requires network access and runs when CONTROL_PLANE_VERSION is first accessed. This is acceptable for the current usage context (build, deployment, and E2E fixture targets), but could fail in offline scenarios. If offline builds become a requirement, consider making git fetch optional or caching the version.

@jason-lynch jason-lynch force-pushed the test/e2e-makefile-improvements branch 2 times, most recently from 016d10c to 653a48d Compare January 8, 2026 16:37
Base automatically changed from fix/cancel-task-panic to main January 8, 2026 20:25
@jason-lynch jason-lynch force-pushed the test/e2e-makefile-improvements branch from 653a48d to 887568d Compare January 8, 2026 21:15
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @docs/development/e2e-tests.md:
- Around line 410-411: The example command is missing the required make prefix;
update the line containing deploy-lima-fixture with the FIXTURE_VARIANT=small
argument to prepend "make" so it reads make deploy-lima-fixture
FIXTURE_VARIANT=small, keeping formatting consistent with other examples in the
docs.
🧹 Nitpick comments (2)
e2e/fixtures/Makefile (1)

6-10: Add validation or documentation for missing variant files.

The variant_vars_file is constructed from the VARIANT variable without validation. If a user provides an invalid variant name, Ansible will fail with a potentially confusing error when it tries to load the non-existent vars file.

Consider adding a comment or validation

Add a comment documenting the available variants, or consider adding validation in the targets that use this variable:

 VARIANT ?= large
 EXTRA_VARS ?=
 
+# variant_vars_file is constructed from VARIANT (available: small, large, huge, small_global, large_global, huge_global)
 variant_vars_file=vars/$(VARIANT).yaml

Alternatively, you could add validation to targets that depend on this variable, though that may be overkill given the targets are invoked through documented top-level Make targets.

docs/development/e2e-tests.md (1)

422-424: Clarify consequences of not tearing down before switching variants.

The note warns about running teardown before switching variants but doesn't explain what happens if you don't. Users might wonder if it's just a best practice or if it will cause errors.

Consider adding clarification
 > [!NOTE]
-> Make sure to run the `teardown-{lima,ec2}-fixture` before switching variants.
+> Make sure to run `teardown-{lima,ec2}-fixture` before switching variants, as deploying a different variant without teardown may leave orphaned VMs or cause deployment conflicts.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3aff251 and 887568d.

📒 Files selected for processing (15)
  • .circleci/config.yml
  • Makefile
  • api/Makefile
  • common.mk
  • docs/development/e2e-tests.md
  • e2e/fixtures/Makefile
  • e2e/fixtures/vars/ec2.yaml
  • e2e/fixtures/vars/huge.yaml
  • e2e/fixtures/vars/huge_global.yaml
  • e2e/fixtures/vars/large.yaml
  • e2e/fixtures/vars/large_global.yaml
  • e2e/fixtures/vars/lima.yaml
  • e2e/fixtures/vars/lima_rocky_global.yaml
  • e2e/fixtures/vars/small.yaml
  • e2e/fixtures/vars/small_global.yaml
💤 Files with no reviewable changes (3)
  • e2e/fixtures/vars/small.yaml
  • e2e/fixtures/vars/lima_rocky_global.yaml
  • e2e/fixtures/vars/lima.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
  • common.mk
  • e2e/fixtures/vars/small_global.yaml
  • .circleci/config.yml
  • e2e/fixtures/vars/ec2.yaml
  • e2e/fixtures/vars/large.yaml
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-08T16:29:19.608Z
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 238
File: e2e/fixtures/vars/huge_global.yaml:5-38
Timestamp: 2026-01-08T16:29:19.608Z
Learning: In e2e/fixtures/vars/*.yaml, treat the 'region' field as a spec for simulated latency in tests, not as the real deployment region. Ensure test latency computation uses this value and that reviewers understand it does not affect infrastructure placement. Consider adding a brief note in fixture docs clarifying this distinction.

Applied to files:

  • e2e/fixtures/vars/huge_global.yaml
  • e2e/fixtures/vars/huge.yaml
  • e2e/fixtures/vars/large_global.yaml
📚 Learning: 2026-01-08T16:29:19.608Z
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 238
File: e2e/fixtures/vars/huge_global.yaml:5-38
Timestamp: 2026-01-08T16:29:19.608Z
Learning: In the E2E fixture configuration files (e2e/fixtures/vars/*.yaml), the `region` field specified for each host machine is used to compute simulated latency for testing purposes, not to determine the actual deployment region for the infrastructure.

Applied to files:

  • docs/development/e2e-tests.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (13)
e2e/fixtures/vars/huge.yaml (1)

2-26: LGTM! Swarm manager distribution maintains proper quorum.

The swarm manager role changes result in 5 managers (host-1, host-2, host-5, host-6, host-9) across the 12-node cluster, which maintains an odd number for proper quorum and can tolerate up to 2 manager failures.

e2e/fixtures/vars/huge_global.yaml (1)

2-38: LGTM! Swarm configuration and region distribution look correct.

The changes maintain proper swarm manager quorum (5 managers) and add region fields to support multi-region fixture testing. The region field is used for simulated latency in tests rather than actual infrastructure placement, with a balanced distribution across us-west-1, af-south-1, and ap-southeast-4 (4 hosts each).

Based on learnings, the region field drives test latency simulation rather than deployment location.

Makefile (3)

1-1: LGTM! Centralized configuration through common.mk.

The switch to common.mk aligns with the repository-wide refactoring and centralizes shared variable definitions (including CONTROL_PLANE_VERSION).


9-9: Note the E2E test configuration changes.

The default parallel execution has been set to 8 workers, and the timeout has been significantly increased from 20m to 45m. The timeout increase likely accommodates the expanded fixture deployment workflows.

Also applies to: 33-33


154-230: LGTM! Well-structured fixture lifecycle targets.

The new fixture orchestration framework provides a clean pattern-based approach for managing fixture lifecycles (deploy/update/reset/stop/teardown) across Lima and EC2 providers. The conditional build logic correctly skips goreleaser-build when an external Control Plane image is specified, and the variable passing to sub-makes properly propagates variant and extra vars configurations.

e2e/fixtures/vars/large_global.yaml (1)

2-20: LGTM! Swarm quorum and multi-region distribution are correct.

The changes maintain proper swarm manager quorum (3 managers: host-1, host-3, host-5) for the 6-node cluster and add balanced region distribution (2 hosts per region) for multi-region testing. The removal of the s3_repository block is consistent with the refactoring seen in huge_global.yaml.

Based on learnings, the region field drives test latency simulation rather than deployment location.

api/Makefile (1)

1-1: No verification needed — common.mk correctly provides required tools.

Verification confirms that common.mk (at the repository root) defines both goa (line 6) and yamlfmt (line 9), which are used in api/Makefile's generate-v1 target (lines 5-6). The include path ../common.mk is correct.

e2e/fixtures/Makefile (2)

15-18: LGTM! Ansible extra-vars ordering enables proper override behavior.

The variant_vars_file is loaded between the provider-specific vars file and EXTRA_VARS, which allows users to override variant settings when needed. This is the correct ordering for Ansible's variable precedence.


55-58: LGTM! Consistent with Lima configuration.

The EC2 ansible playbook invocation mirrors the Lima configuration structure, maintaining consistency across providers.

docs/development/e2e-tests.md (4)

210-226: LGTM! Clear documentation of Lima fixture lifecycle.

The updated targets provide a cleaner, more intuitive workflow compared to invoking individual playbook targets. The descriptions clearly explain what each target does.


228-235: LGTM! Accurate description of test config generation.

The documentation correctly describes which targets generate the test config file and provides a helpful example of viewing it.


257-268: LGTM! Consistent use of FIXTURE_EXTRA_VARS.

The updated examples correctly demonstrate passing architecture overrides using FIXTURE_EXTRA_VARS across all relevant targets.


433-440: Helpful context on global variants with simulated latency.

The description of global variants aligns with the retrieved learning that the region field is used for simulated latency computation rather than actual deployment region. This is clear and informative.

Based on learnings, the region field in variant configurations is correctly documented as being for simulated latency purposes.

@jason-lynch jason-lynch force-pushed the test/e2e-makefile-improvements branch from 887568d to 084ebf0 Compare January 8, 2026 21:23
We changed our release tagging scheme so that release tags are created
on the release branches instead of on `main`. This change broke the
`CONTROL_PLANE_VERSION` computation on `main`, resulting in it always
returning an older version.

This changes the `CONTROL_PLANE_VERSION` computation check for the
latest release tag on the current release branch. This requires a git
fetch to run reliably, so I've used a "deferred simple variable
expression" idiom to ensure we only compute this value once when it's
used.
Refactors the e2e fixture vars files to reuse the `machines`
configuration for both the Lima and EC2 fixtures. This change also
simplifies the mechanism to specify different fixture variants and
changes the default variant to `large` to match the Docker
compose-based setup.
Adds top-level make targets for managing E2E fixtures to cover common
use cases and reduce the number of commands to use the fixtures:

- `deploy-{lima,ec2}-fixture`: Builds the Control Plane binaries,
  deploys and sets up the VMs, then deploys the Control Plane.
- `update-{lima,ec2}-fixture`: Builds and deploys the Control Plane to
  an already-deployed fixture.
- `reset-{lima,ec2}-fixture`: Builds the Control Plane binaries, tears
  down the existing Control Plane installation, and redeploys the
  Control Plane.
- `stop-{lima,ec2}-fixture`: Stops the VMs
- `teardown-{lima,ec2}-fixture`: Deletes the VMs
@jason-lynch jason-lynch force-pushed the test/e2e-makefile-improvements branch from 084ebf0 to 3f28466 Compare January 13, 2026 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants